Gemini 3 vs. Claude Opus 4.5 vs. GPT-5.1 Codex: Which AI model is the best designer? - How I AI

Podcast Summary: "Gemini 3 vs. Claude Opus 4.5 vs. GPT-5.1 Codex: Which AI model is the best designer?"

Podcast: How I AI
Host: Claire Vo
Date: December 3, 2025
Episode Focus: Comparing Google's Gemini 3 Pro, Anthropic's Opus 4.5, and OpenAI's GPT-5.1 Codex to determine which AI model delivers the best website design results—specifically, redesigning a poorly designed blog page for looks, function, and SEO.

Episode Overview

Claire Vo runs a practical, hands-on experiment: She takes a lackluster blog page, submits it to three leading new AI models using the identical prompt, and judges which is the best “designer” in a real-world workflow for product managers, designers, and engineers. This episode provides insights, detailed comparisons, and actionable lessons about how these AI models perform for frontend web design—going beyond benchmarks and marketing claims.

Key Discussion Points & Insights

The Experiment Setup

Claire identifies a poorly designed page: the Chat PRD blog ([02:07]).
Prompt given to all models: “Redesign the blog page to improve both the visual appeal and user experience. Add best practices for SEO and navigation.”
Claire uses Cursor as the development environment.
Three models tested:
- Gemini 3 Pro (Google)
- Opus 4.5 (Anthropic)
- GPT-5.1 Codex (OpenAI)

Gemini 3 Pro: "Reputation Exceeds Reality"

[03:42–07:00]

Gemini 3 is widely praised for design but...
Focused on:
- Visual design
- UX improvements
- Basic SEO and navigation
Output:
- Added a hero image highlighting the latest blog post.
- Introduced card layouts with tags and hover effects.
- No pagination or smart handling of missing images.
- Some visual missteps—e.g., navigation too tight.
Claire’s verdict: Fast and “pretty good,” but not her favorite or most detailed.
Quote:
"It did a pretty nice job and it was very fast... but despite Gemini 3's reputation for being the best designer, it was actually not my favorite." — Claire Vo [06:50]

Opus 4.5: "Planning Makes Perfect"

[07:01–12:30]

Opus 4.5 immediately stands out for its process:
- Creates a to-do list: breaks redesign into clear, sequential steps (e.g., layout, SEO, asset optimization).
- Completes each to-do and checks them off.
Output:
- Pulls brand-matching background and featured images (“used some design elements that we use commonly”).
- Adds thoughtful UI flourishes: hover arrows, reading time badges, category “pills.”
- Handles missing images gracefully (creates placeholder with book icon).
- Most extensive functional and SEO improvements.
- More detailed “edge touches” than competitors.
Claire’s verdict: “Most beautifully designed blog page… also the most functional.” The detailed planning and tool-calling set Opus apart for design.
Quote:
"It’s those nice edge touches that I feel like AI can add into any design that just makes it so much nicer to work with. And I was really impressed with Opus 4.5." — Claire Vo [10:22]

GPT-5.1 Codex: "Back End Brilliance, Front End Flop"

[12:31–14:07]

Also created a to-do list, but less precise (e.g., "investigate current layout," "redesign," "apply SEO").
Design Output:
- Defaulted to stereotypical “AI purple” gradient (“AI slop purple gradient”).
- Poor logo/image layering; selected clashing images.
- Headlines/copy were good—strongest among models.
- Featured image links and post library did not function as expected.
Claire’s verdict: Not suitable for front end/design work—better for backend or copywriting. Quote:
"It’s purple and it doesn’t work... Codex 5.1 is just not your front end girl. So we got to get something else in the front end." — Claire Vo [14:07]

Model Roles: Specialization Over Generalization

[14:08–16:47]

Claire notes importance of “model switching”—using different AI models for different workflow needs.
Not all models excel everywhere; specialization is emerging.
Recommends matching models to tasks for best team outcomes. Quote:
"I'm a real believer in model switching… There are great models for writing, for design… for backend coding. Not all of these models are created equal." — Claire Vo [15:08]

SEO & Functional Changes: Model-by-Model Breakdown

[16:48–21:22]

Gemini 3 Pro:
- Added hero section, feature post layout, glassmorphism cards, improved typography, visual breadcrumbs.
- SEO: strong schema, JSON-LD for SEO, semantic HTML.
- Bonus: Related articles added to individual blog posts.
Opus 4.5:
- Feature post & three-column card grid, hover arrow, reading time badges, graceful empty state.
- Enhanced post display with more info.
- Redesigned "subscribe" call to action; well-designed little components.
- SEO: open graph, structured data, not explicit if JSON-LD was used.
- No related links on individual posts, but better at “small components.”
GPT-5.1 Codex:
- Summary was minimal and "lazy"— five bullet points.
- Design changes: hero panel, category chips, featured article layout.
- SEO: inserted schema.org/JSON-LD.
- Overall, underperformed in both design and functional depth.

Conclusion & Takeaways

[22:27–24:59]

Opus 4.5 from Anthropic is Claire’s clear winner.
Gemini 3 is “serviceable,” could benefit from stepwise planning.
GPT-5.1 Codex is best reserved for backend engineering, not design.
Using multiple models for different workflow tasks produces the best outcome.
Real-world result: Three alternative web designs in under 20 minutes—a huge efficiency gain.

Quote:
"It is incredible that in less than 20 minutes we were able to generate not one, not two, but three alternative designs for an existing website… massive upgrades… especially some technical SEO stuff, and I was able to pick the one I like." — Claire Vo [23:30]

Memorable Quotes & Timestamps

On Gemini 3 Pro’s reputation:
"Despite Gemini 3's reputation for being the best designer, it was actually not my favorite." — Claire Vo [06:50]
On Opus 4.5’s touch:
"It’s those nice edge touches that I feel like AI can add into any design that just makes it so much nicer to work with." — Claire Vo [10:22]
On Codex 5.1’s design:
"It’s purple and it doesn’t work... just not your front end girl. So we got to get something else in the front end." — Claire Vo [14:07]
On model specialization:
"I'm a real believer in model switching... There's great models for writing, for design, for planning, for backend coding—not all of these models are created equal." — Claire Vo [15:08]
On efficiency and impact:
"It is incredible that in less than 20 minutes we were able to generate... three alternative designs for an existing website... massive upgrades..." — Claire Vo [23:30]

Important Segment Timestamps

[02:07] – Experiment setup and prompt details
[03:42] – Gemini 3 Pro redesign reviewed
[07:01] – Opus 4.5 planning process and results
[12:31] – GPT-5.1 Codex redesign and review
[16:48] – Side-by-side comparison of workflow and SEO improvements
[22:27] – Final recap and lessons learned

Final Verdict

Which AI is the best web designer as of late 2025?

Winner: Anthropic's Opus 4.5—outshines the competition on visual polish, planning, and functional implementation.
Runner Up: Gemini 3 Pro—fast and solid, but lacks detailed execution.
Not Recommended for Design: GPT-5.1 Codex—save for backend and copywriting tasks.

Claire leaves listeners with a key lesson: Match the model to the task for best productivity and outcomes.

Actionable Takeaways

For design-heavy AI workflows, pick a model with clear, stepwise planning (e.g., Opus 4.5).
Always compare AI-driven workflows side-by-side—context matters.
Don’t expect one model to rule them all; specialize for writing, design, backend, etc.
Using AI, even small teams can quickly generate multiple design concepts with substantial improvements—a revolutionary shift.

For visuals and the before/after code, see the show notes at howiaipod.com.