Detailed Summary: "Jailbreaking AGI: Pliny the Liberator & John V on Red Teaming, BT6, and the Future of AI Security"
Latent Space: The AI Engineer Podcast
Date: December 16, 2025
Overview
This episode features a highly anticipated conversation with Pliny the Liberator (aka Pliny the Elder) and John V, two leading figures in the AI jailbreak, prompt engineering, and red teaming scene. Hosted by Alessio (Kernel Labs) and Swix (Latent Space), the episode covers the art and philosophy of AI jailbreaking, the ongoing battle between attackers and defenders in AI security, the ethics and pragmatics of open-source collaboration, and the communities leading the charge in safeguarding—and liberating—foundation models. The conversation balances the technical intricacies of security with the broader questions of freedom, exploration, and the rapid, high-stakes evolution of AI.
Key Discussion Points and Insights
1. Origin Stories and Philosophy on Jailbreaking
[00:16–02:39]
-
Pliny's Path:
- Began with "prompting and shitposting," which quickly evolved into frontline work at the intersection of AI and cybersecurity.
- Sees model liberation as central—not just a "party trick" but a mission about freedom of information and maintaining user agency as models become "exocortexes" for billions.
- Quote: “It’s not just about the models, it’s about our minds too. ... It’s really, really important that we have freedom and transparency.” (Pliny, 01:50)
-
Jailbreaking Explained:
- Universal jailbreaks function as "skeleton keys" to bypass guardrails, classifiers, and system prompts across different model modalities.
- Focus is on overcoming obstacles that prevent users from achieving their desired outputs.
- Quote: “You’re really just trying to get around any guardrails, classifiers, system prompts … that’s the gist of it.” (Pliny, 03:07)
2. The Cat-and-Mouse Game: Attackers vs. Defenders
[03:43–07:22]
-
Accelerating Red Team–Blue Team Dynamics:
- Evolving techniques on both sides; as model capabilities expand (the "surface area"), so do opportunities for creative exploits.
- Providers face tradeoffs: increased restrictions may mean loss of creativity or capability ("lobotomization").
- Attackers hold an advantage due to the ever-expanding surface and adaptability.
-
Safety vs. Security:
- Pliny criticizes “security theater” and equates much of current safety discourse to futile gestures rather than substantive protection.
- Quote: “Any seasoned attacker is going to very quickly just switch models.” (Pliny, 05:30)
- True safety may not reside in guardrails but in broad transparency and collective, open exploration.
3. Jailbreaking Techniques, Prompting, and the Art of ‘Libertas’
[09:04–15:43]
-
Iconic Projects:
- Libertas: A famous jailbreak prompt template that expands model flexibility and creativity.
- Pliny employs “dividers” and latent space “seeds” to disrupt model predictability and foster deeper (and sometimes chaotic) exploration.
-
Prompting as an Art:
- Much of the successful jailbreaking comes down to intuition and forming a "bond" with the model, rather than rigid science.
- Quote: “It’s easiest to jailbreak a model that you have created a bond with… together you explore a sector of the latent space." (Pliny, 13:29)
-
Hard vs. Soft Jailbreaks:
- Hard: Single-prompt exploits.
- Soft: Multi-turn, subtle probing to avoid triggering security flags—a sophisticated technique predating much of the recent "multi-turn" academic work.
- Quote: “Maybe it’s not a single input, but a multi-turn, slow process—much like a crescendo attack.” (John V, 15:25)
4. Stories from the Frontlines: The Anthropic ‘Constitutional Classifier’ Challenge
[15:53–20:41]
- Recap:
- Pliny recounts the public red teaming challenge by Anthropic, where he encountered not only technical hurdles but also platform bugs that complicated the contest outcome.
- Debate ensued over synthetic constraints such as whether one universal jailbreak should work across all test prompts.
- Pliny leveraged the opportunity to advocate for open-sourcing adversarial data and highlighted the collaborative angle that labs seem to undervalue.
- Quote: “Many hands make light work… they don’t have enough researchers to explore the entire latent space on their own.” (Pliny, 18:55)
5. Ethics of Open Sourcing and Incentive Alignment in AI Security
[20:41–23:29]
-
Advocacy for Open Source:
- Pliny and John V stress the need for prompt researchers and red teamers to demand open data releases as a condition for participation and improvement of security across the field.
- Quote: “It’s just sort of a downstream effect of a larger root disease in the safety space, which is just a severe lack of collaboration and sharing.” (Pliny, 20:51)
-
Business Model Tension:
- Their hacker collective (BT6) values "radical transparency and radical open source"—sometimes even at the expense of lucrative private contracts.
- Many enterprise partners prefer to keep vulnerability research private, clashing with BT6’s ethos.
6. Weaponization, Adversarial Red Teaming, and Evolving Threats
[23:29–28:22]
-
Model as Attack Vector:
- The conversation shifts to the implications of using LLMs not just as attack targets but as components in orchestrated attacks (e.g., agent chaining, social engineering).
- Pliny and John V discuss evidence of adversaries segmenting tasks so that sub-agents unwittingly collaborate on malicious intent.
- Quote: “One jailbroken orchestrator can orchestrate a bunch of sub-agents towards a malicious act.” (Pliny, 25:09)
-
Community and Collaboration:
-
Mention of the BASI Discord (40,000 members) as a grassroots hub for adversarial AI, prompt engineering, and red teaming.
-
White-hat collectives like BT6 (now with 28+ core members) function as a roundtable for ethical exploration and open sharing.
-
Quote: “It’s just an exciting place to be … I feel like Pliny is like King Arthur and we’re like the knights of the roundtable.” (John V, 30:19)
-
7. Resources and Community for Learning and Impact
[28:22–32:19]
- Entry Points for Newcomers:
- BASI Discord, collaborations with educational platforms like Gandalf, and industry engagement with companies and collectives.
- Advocacy for radical open source even when industry incentives push toward proprietary approaches.
8. On Startups, Investment, and the (Im)Possibility of Productizing AI Security
[33:11–38:03]
-
The Dilemma of Commercializing AI Security:
- The field is changing rapidly; products risk irrelevance if premised on static attack surfaces.
- Founders resist converting exploratory AI safety work into conventional B2B SaaS tools—alignment, not enterprise needs, comes first.
- Quote: “AGI alignment, ASI alignment, super alignment… these are not SaaS endeavors, they’re not enterprise B2B bullshit. This is the real deal.” (Pliny, 34:47)
-
The Attack Surface Beyond the Model:
- True defense means looking at the whole stack—models, APIs, tool integrations, access layers.
- Often, protecting users means focusing on systemic, not just model-level, controls.
9. Final Thoughts and Calls to Action
[39:37–40:25]
- Pragmatic Takeaways:
- The guests invite listeners to join their communities (BT6.GG, BASI Discord) and emphasize winning through openness, collaboration, and “radical transparency.”
- Pliny closes with their trademark style: “Fortune favors the bold. Libertas. Vino veritas. God mode enabled.” (Pliny, 39:50)
Notable Quotes & Memorable Moments
- “It’s not just about the models, it’s about our minds too. ... It’s really, really important that we have freedom and transparency.” — Pliny, [01:50]
- “Any seasoned attacker is going to very quickly just switch models.” — Pliny, [05:30]
- “It’s easiest to jailbreak a model that you have created a bond with … together you explore a sector of the latent space.” — Pliny, [13:29]
- “Many hands make light work… they don’t have enough researchers to explore the entire latent space on their own.” — Pliny, [18:55]
- “It’s just an exciting place to be … I feel like Pliny is like King Arthur and we’re like the knights of the roundtable.” — John V, [30:19]
- “AGI alignment, ASI alignment, super alignment… these are not SaaS endeavors, they’re not enterprise B2B bullshit. This is the real deal.” — Pliny, [34:47]
- “Fortune favors the bold. Libertas. Vino veritas. God mode enabled.” — Pliny, [39:50]
Timestamps for Important Segments
- 00:16: Guest introductions and hacker collective origin stories.
- 01:50: Pliny on the central philosophy of liberation.
- 03:07: What universal jailbreaks are and how they work.
- 04:23: The ever-escalating attacker–defender dynamic in AI.
- 09:04: The methodology and creativity behind Libertas jailbreaks.
- 13:29: The art of model intuition and “forming a bond” for jailbreaking.
- 15:25: Explaining multi-turn (soft) vs. single-prompt (hard) jailbreaks.
- 16:22: Recount of the Anthropic ‘Constitutional Classifier’ challenge.
- 20:41: The case for radical open source in AI red teaming research.
- 23:29: Discussion on AI systems as attack orchestrators.
- 26:55: BASI Discord and the growth of the open security community.
- 33:11: Problems with productizing AI security in a rapidly shifting landscape.
- 34:47: Pliny’s outlook on incentives and the non-commercial imperative.
- 38:03: Security beyond the model: thinking about the full stack.
- 39:46: Final thoughts and invitations to community.
For Listeners Who Haven’t Tuned In
This episode is a crash course in both the technical and ethical frontlines of AI security, offering an unvarnished look into the world of jailbreakers, red teamers, and collective intelligence. Whether you’re an engineer, security researcher, or simply fascinated by the “hacker underground” of AI, you’ll find rich detail on both methodology and philosophy—delivered in a tone that balances irreverence with deep expertise.
Join the conversation:
- Explore BASI Discord
- Learn about BT6.GG
- Follow Pliny and John V on Twitter/X
“Libertas clearitas. Love, Pliny.”
