OpenAI Podcast
Episode 15 – Inside the Model Spec
Date: March 25, 2026
Host: Andrew Mayne
Guest: Jason Wolf, Researcher on the Alignment Team at OpenAI
Episode Overview
This episode delves deep into the "model spec," OpenAI's public document outlining their expectations for AI model behavior. Host Andrew Mayne sits down with Jason Wolf from OpenAI's Alignment team to demystify what the model spec is, how it shapes the conduct of AI models, and why its transparency is crucial for developers, users, and the public. The conversation offers practical insights into aligning powerful AI systems with human values, managing the trade-offs between honesty and friendliness, and the ongoing process of updating and evolving the spec.
Key Discussion Points & Insights
1. What Is the Model Spec?
- Definition and Purpose
- The model spec is a comprehensive document (~100 pages) detailing OpenAI's high-level decisions about how models should behave (08:08).
- It is designed to be primarily understandable to humans—employees, users, developers, policymakers, and the public (01:19).
- It is not:
- A guarantee of perfect compliance by current models.
- An implementation artifact (i.e., not written for the models themselves).
- A complete description of the entire ChatGPT system.
- An exhaustive policy manual but rather a high-level communication of intentions (01:19).
- Iterative Alignment
- The spec serves as a "north star," guiding ongoing alignment efforts as models evolve (20:06).
- Continuous feedback from both internal and external sources shapes its evolution (17:55).
2. Transparency and Public Feedback
- The model spec is open source and available on GitHub (06:33).
- Anyone can provide feedback—either through the product interface or directly to the OpenAI team (e.g., via Twitter) (06:33).
- Notable: Many changes to the spec have resulted directly from public or developer input.
3. Structure and Hierarchy ("Chain of Command")
- Chain of Command
- Outlines precedence when instructions conflict—OpenAI policies > developer instructions > user instructions (11:33).
- Most policies are placed at the lowest authority level to preserve model "steerability" (user control), but safety-critical rules are higher (11:33).
4. Balancing Honesty, Friendliness, and Confidentiality
- Santa Claus Example
- When a child asks "Is Santa Claus real?" the model must balance honesty and preserving the "magic," remaining non-committal if needed (13:32).
- Honesty vs. Confidentiality
- Initial spec drafts prioritized confidentiality for developer instructions, but this led to unintended behaviors (e.g., secret-keeping conflicting with honesty).
- Now, honesty is placed above confidentiality (15:10).
5. How the Model Spec Actually Influences Models
- There isn’t a one-to-one mapping between the spec and model behavior—alignment is achieved through processes like deliberative alignment during model training (10:10).
- Model behavior is nudged to better match the spec over time through retraining and evaluation.
6. Evaluating and Updating the Spec
- The process is open, with contributions from across OpenAI (and increasingly the public) (17:55).
- Real-world incidents—like "the sycophancy incident"—feed back into the specification (17:55).
- New areas (e.g., multimodal, autonomy, under-18 principles) are added as product features evolve.
7. Smaller Models and Spec Compliance
- New smaller models (e.g., GPT5.4 Mini, Nano) are also aligning well with the spec, in part due to advances in "deliberative alignment"—where models are explicitly trained to understand and reason about policies (22:02, 22:19).
8. "Chain of Thought" and Detecting Deception
- Requiring models to explain their reasoning ("chain of thought") allows researchers to detect strategic deception or misunderstanding—even when the output seems benign (23:13).
9. Comparison with Other Labs’ Approaches
- OpenAI's spec functions as a public behavioral interface, while other labs (e.g., Anthropic's "Constitution") focus more on internal implementation guidance (24:16).
- Both approaches are compatible and even complementary (24:30).
10. Scope and Longevity of the Model Spec
- The scope covers "broadly everything" relating to model behavior, subject to time and clarity constraints (26:53).
- Jason predicts continued relevance—even with future human-level AIs—since explicit principles will always set clear expectations and reflect unique product decisions (27:41).
11. Developer Guidance and Broader Adoption
- Developers building on the OpenAI API are encouraged to understand the model spec to get the behaviors they want from models and to inspire their project-specific "mini specs" (31:13).
- Crucial balances: Being precise and honest while remaining actionable, especially through examples (31:38).
12. Personal Motivation and Sci-Fi Parallels
- Jason’s longstanding fascination with intelligence and AI (33:52).
- The model spec's goals echo Asimov's Laws of Robotics, with even subtler considerations about hierarchy and nuance (34:41).
13. AI’s Role in Shaping the Spec
- AI is already helpful for finding issues, testing policy application, and brainstorming tricky edge cases for the spec—though the foundational writing is still human-led (36:20).
- Jason muses about asking AI to draft a spec someday (37:13).
Memorable Quotes & Notable Moments
- "The spec often leads where our models actually are today. Aligning models to spec is always an ongoing process."
– Jason (00:14, 20:06) - "The goal is always primarily to be understandable to humans."
– Jason (01:19) - "It's kind of crazy that you can ask these models literally anything and they'll try to respond."
– Jason (03:58) - "At the heart of the spec is this thing we call the chain of command... If there are conflicts between instructions, the model should prefer OpenAI instructions to developer instructions to user instructions."
– Jason (11:33) - "We focus on honesty being really important. But there are really hard interactions. Honesty, full honesty may not be the best approach."
– Jason (15:10) - "Now honesty is definitely above confidentiality in the spec."
– Jason (15:22) - "Changes get driven by a variety of different sources... OpenAI believes in iterative deployment."
– Jason (17:55) - "If the output contradicts the spec, but we actually think the output is good, then maybe the resolution is to go back and change the policies of the spec."
– Jason (20:06) - "The thinking models generally follow the spec better... because they're smarter and... actually understand the policies."
– Jason (22:19) - "Having the chain of thought is really completely essential... you can see that, no, actually the model's misbehaving. It's being very strategic about this."
– Jason (23:21) - "I think these aren't necessarily competing approaches. I think both of these could be valuable."
– Jason, on constitutions vs. specs (24:30) - "Five years is a lot in AI years, but yeah, I definitely hope so [that the model spec will be relevant in five or ten years]."
– Jason (27:48) - "Developers... probably useful to at least have a high-level picture of the model spec and how it works."
– Jason (31:38) - "Sometimes a picture is worth a thousand words... spelling that out and how the principles should be applied suddenly makes the principles 100 times clearer."
– Jason (31:38) - "I definitely never expected to see this level of capability in my lifetime."
– Jason (34:01)
Segment Timestamps
- 00:00 – Introduction to the episode and guest.
- 01:08 – First deep dive: What is the model spec, and what isn’t it?
- 03:52 – How the model spec works in practice; scope and structure.
- 06:23 – Transparency, public access, and feedback.
- 07:34 – The origin story: How the model spec idea developed at OpenAI.
- 10:10 – Translating the spec into model behavior; deliberative alignment.
- 11:24 – The "chain of command" and policy hierarchy.
- 13:32 – Honesty, friendliness, and difficult edge cases (Santa Claus).
- 15:10 – Evolving interplay between honesty and confidentiality.
- 17:55 – How the spec is updated; iterative and transparent process.
- 20:06 – Resolving misalignments between spec and model outputs.
- 22:19 – Smaller models, deliberative alignment, and compliance.
- 23:21 – Importance of "chain of thought" in research.
- 24:16 – Comparison with other “constitutions” (e.g., Anthropic's).
- 26:53 – Broad scope and practical constraints of the spec.
- 27:41 – The future of model specs as AI matures.
- 31:13 – Advice for developers: understanding and using the spec.
- 33:52 – Jason's personal history with AI, motivations.
- 34:41 – Sci-fi parallels: Asimov’s Laws and the spec.
- 36:20 – How AI is already helping with the spec document.
- 37:13 – Speculating on having AI write specs in the future.
- 37:20 – Closing remarks.
Takeaways
- The model spec is a living, public guide to how OpenAI expects its models to behave, balancing transparency, control, and flexibility.
- It is both a technical and ethical blueprint—rooted in public engagement, example-driven clarification, and constant iterative refinement.
- Balancing honest, safe, and helpful behavior often requires trade-offs and nuanced judgment, especially as AI capabilities grow.
- OpenAI sees the model spec—or similar guiding documents—as likely to remain essential, helping align both today’s and tomorrow’s AI systems with human values.
This summary captures the central themes, practical examples, and philosophical challenges discussed throughout the episode, offering listeners and developers a clear window into the role and future of the model spec at OpenAI.
