Inside the Black Box: Cracking AI and Deep Learning

Fine Tuning Lora: It's Not What You Think

When you fine-tune an AI model, what changes inside doesn't predict what changes outside. This week on Inside the Black Box, I break down why — and what it means for anyone auditing or regulating these systems.

Transcribe →

When Fluent Answers Start Sounding True

May 200:15:20Tap to summarize

This episode explores why smooth, coherent language can feel more credible than it is, and how processing fluency, familiarity, and authority cues shape what we believe. It also digs into why conversational AI is especially persuasive, from polished explanations to confident-sounding confabulations.

Transcribe →

Why Your Brain Believes the Model

Apr 2700:24:55Tap to summarize

The Heuristic Loop You Can't Break from Inside

Transcribe →

When Polished Answers Feel Finished

Apr 2000:27:53Tap to summarize

This episode explores fluency-as-validity: the way polished AI responses can make us feel like the work of judgment is already done. It also looks at why large language models are so effective at creating the sensation of clarity, and why mechanistic interpretability may be a way to push back against that enchantment.

Transcribe →

What Seneca Teaches Us that Marcus Couldn't

Apr 1200:17:26Tap to summarize

716 features fire on both Seneca and Marcus Aurelius but stay dark for ad copy. The model learned Stoic philosophy, not just an author's style. Plus: why 'inert' features aren't all the same thing.

Transcribe →

The Pattern Holds for Another Author

Apr 400:15:31Tap to summarize

We trained a fresh LoRA on the letters of Seneca and ran the same analysis pipeline we used on Marcus Aurelius and advertising copy. Every structural finding replicated. The model organizes its adaptation into five clusters: one tight (features moving in lockstep) and four loose (features cooperating more independently). Seneca produced the cleanest clustering we've measured and the strongest workhorse cluster, a group of 141 features encoding philosophical argumentation with a causal effect more than three times stronger than anything in Marcus. Done in collaboration with John Holman.

Transcribe →

The Pattern Holds

Mar 3000:18:27Tap to summarize

We replicated our Marcus Aurelius findings at a new layer, then threw the whole method at 12 commercial ad copy styles trained into a single LoRA. The patterns held, and the new domain revealed something we couldn't have seen before: the model organizes its adaptations by register family, not by individual style.

Transcribe →

Cracking Open the Black Box

Mar 2200:11:21Tap to summarize

We opened the 65%. The features that resisted interpretation one at a time turned out to organize into five co-activation clusters with clear thematic identities and causal effects nearly ten times stronger than any individual feature. Second in a series with John Holman.

Transcribe →

Inside a Fine-Tuned Language Model

Mar 1200:18:50Tap to summarize

A concise, single-segment episode of Inside the Black Box: Cracking AI and Deep Learning where Arshavir Blackwell explains, in one continuous narrative, what neural networks are, how their simple units combine into powerful systems, and how learning by backpropagation sculpts their behavior. This short episode is designed as an elegant, one-paragraph-style monologue that introduces listeners to neural nets without equations or jargon.

Transcribe →

What Counts as Structure? From Harris and Elman to Today’s Neural Nets

Mar 600:13:34Tap to summarize

This episode of Inside the Black Box: Cracking AI and Deep Learning tells the story of an unexpected convergence in the history of language and AI. In 1995, Peter Bensch noticed that Zelig Harris, a mid‑century structural linguist, and Jeff Elman, a pioneer of simple recurrent networks, had independently uncovered the same deep insight about language: structure lives in patterns of use.Arshavir Blackwell, PhD, guides listeners through Harris’s world of distributional linguistics and operator grammar—where you infer structure from where words can substitute for one another—and contrasts it with Elman’s tiny recurrent neural networks that learn to predict the next word. Along the way, we see how these very different traditions arrive at the same place: hidden geometric structure in how language is used.From there, the episode bridges to today’s large language models and mechanistic interpretability, asking a deceptively simple question: what counts as "structure" inside a model? We explore how patterns, clusters, and features relate to genuine internal organization, and why Harris and Elman’s convergence still shapes how we think about circuits, features, and the geometry of meaning in modern AI.

Transcribe →

Inside the Black Box: Cracking AI and Deep Learning

All episodes

Fine Tuning Lora: It's Not What You Think

When Fluent Answers Start Sounding True

Why Your Brain Believes the Model

When Polished Answers Feel Finished

What Seneca Teaches Us that Marcus Couldn't

The Pattern Holds for Another Author

The Pattern Holds

Cracking Open the Black Box

Inside a Fine-Tuned Language Model

What Counts as Structure? From Harris and Elman to Today’s Neural Nets