Statecraft Podcast Summary
Episode Title: Ten Thoughts on Government Data
Host: Santi Ruiz
Guest/Essayist: Violet Buxton Walsh, Immigration Fellow at IFP
Date: March 5, 2026
Link: www.statecraft.pub
Episode Overview
This episode departs from Statecraft’s usual interview format; instead, Violet Buxton Walsh presents her essay “Ten Thoughts on Government Data.” Drawing on her experience managing and building datasets on international student employment, Violet distills crucial lessons for policy analysts, researchers, and policy entrepreneurs on the quirks, limits, and surprising opportunities within U.S. government data. The narrative uncovers why government data is both essential and maddening—and provides actionable wisdom for navigating this world.
Key Discussion Points & Insights
1. Administrative Data Has Major Gaps
(01:20)
- Government systems, like SEVIS, often fail to collect or maintain even seemingly fundamental data.
- There are gaps not just from human error, but sometimes entire systems just don’t exist.
- Even the federal government is missing information expected to be at their fingertips, e.g., up-to-date details on visa holders in the country or current employers for international students.
“We simply cannot know things one might assume we do, like which visa holders are currently in the country or the employer of every working international student.” (02:15, Violet)
2. If Something Seems Off, It Often Is
(03:20)
- Government datasets may have few users, meaning massive errors can go unnoticed for months.
- If data looks wrong, it often is—don’t be afraid to trust your gut.
- The US missing 200,000 international students in 2024 went unnoted until a single user flagged it.
“It’s less likely to be a failure of your understanding than you might expect.” (03:55, Violet)
3. If It’s a Question on a Form, You Can Find Data On It
(05:00)
- Admin data is really just answers to forms.
- Reading the forms or knowing who understands them is vital; key data may be tucked into obscure places.
- Example: Realization that H1B visa recipient wages are captured on specific forms, thanks to an immigration lawyer’s expert knowledge.
“Learning an agency’s paperwork can save you time, too.” (05:38, Violet)
4. We’re Not Actually Counting
(06:20)
- Many government numbers are samples extrapolated to populations, not literal counts.
- Misunderstanding this has caused widely shared but false policy narratives, like employment figures being artifacts of population scaling rather than true growth or decline.
- Ex: Shifts in reported immigrant populations can artificially inflate native-born counts due to data scaling.
“That 2 million more Americans statistic was the result of using data in ways the statistical agencies explicitly tell users not to.” (07:06, Violet)
5. Nobody Understands Statistics
(08:00)
- Statistical nuance is effectively lost in policy circles; simplicity is paramount.
- Policy discussions require you to over-communicate: “Assume you’re talking to an audience of fifth graders.”
- Visuals and footnotes can help, but expect interpretation to be superficial.
“Never, ever assume the numbers speak for themselves.” (08:35, Violet)
6. Nobody Knows How the Whole Thing Works
(09:10)
- Most users (even agency insiders) only understand small parts of a data system.
- Interdepartmental collaboration is essential for holistic analysis.
- Violet’s team could build the OPT Observatory thanks to their complementary expertise across law, policy, software, and engineering.
“Hardly anyone ends up drawing connections between the different parts of the system.” (09:47, Violet)
7. Government Data Systems Were Built for Administration—Not Analysis
(10:05)
- Data systems are designed to serve bureaucratic needs, not to paint big-picture policy insights.
- Queries are built to check status or approve individual cases, not to summarize or analyze in aggregate.
- Extracting anything novel often means creative, time-intensive workarounds.
“These systems can act more like audit trails than flexible databases.” (10:38, Violet)
8. The Trustworthiness of Survey Data Is Under Threat
(11:25)
- Response rates are falling and AI-generated spam is making survey data less reliable.
- Administrative data—despite its flaws—may be the best route forward as survey data’s quality declines.
“In a future where survey data is heavily polluted, administrative records…could become increasingly valuable despite their gaps.” (12:00, Violet)
9. Organizational Incentives Can Make Government Data Messy
(12:20)
- Outdated systems force civil servants into convoluted workarounds, inducing messy, inconsistent data.
- Understanding the political and institutional context is often crucial for decoding data anomalies.
“Often they were last updated decades ago…Understanding underlying incentives…is invaluable for deciphering that information.” (12:56, Violet)
10. Being Useful Requires Practitioner Knowledge
(13:15)
- Unusual data quirks often trace back to specific historical or bureaucratic causes.
- Learning from long-term practitioners reveals vital context that’s invisible in the raw numbers.
- Even past regulatory changes (like a 2008 shift in employer documentation) still affect the reliability and structure of today’s data.
“If you want to discover anything new…you have to find out what others already understand by engaging with their expertise.” (13:45, Violet)
Notable Quotes & Memorable Moments
- “If you want to make a point based on data, stick to publishing graphs with a single red line going up or down and to the right. If you want to be honest, include detailed footnotes.” (08:50, Violet)
- “The accumulated knowledge of both policy changes and their implementation makes it readily apparent which data mysteries are actually the legacy of changes in user behavior.” (14:36, Violet)
- “Government data is an enormous catch-all, and nascent efforts to make it accessible like data.gov are a promising start to a challenging problem.” (15:09, Violet)
Additional Mentions
- Violet credits Jennifer Pahlka and IFP’s Amy Nice for further reading and inspiration.
- She notes that “nascent efforts to make [government data] accessible like data.gov are a promising start to a challenging problem.” (15:09)
- Thanks extended to Peter Bowman, Davis Conder, Sandegada, and Jeremy Neufeld for early comments, and Thomas Hockman for inspiring the essay format.
- Violet encourages feedback and additions via Twitter.
Conclusion
Violet’s “Ten Thoughts on Government Data” is an invaluable guide for anyone navigating U.S. data for policy or advocacy. Her takeaways highlight the need for humility, persistence, practitioner engagement, and clarity in communication. Throughout, her wit and practical advice make the episode both accessible and essential listening for would-be data-savvy policymakers.
