The Analytics Power Hour – Episode #290: "Always Be Learning"
Release Date: February 3, 2026
Hosts: Tim Wilson, Val Kroll
Guest: Martin Schulzberg (Product Manager & Staff Data Scientist, Spotify)
Overview: Exploring “Learning Rate” in Experimentation
This episode centers on how organizations, particularly Spotify, move beyond traditional "win rate" metrics in experimentation to focus on a richer, more informative "learning rate." The discussion features Martin Schulzberg, co-author of Spotify’s "Beyond Win Rate: Experiments with Learning Framework," delving into how to measure real learning in digital analytics, how culture and tooling shape this, and the nuances of experimentation at scale.
Key Discussion Points & Insights
1. The Limits of “Win Rate” and Introduction of “Learning Rate”
- Win Rate: Traditionally celebrated in experimentation, defined as the percentage of experiments producing a statistically significant, positive improvement over a previous variant.
- Learning Rate: A broader measure that captures three meaningful outcomes:
- Obvious Winner: The new variant is measurably better (win).
- Obvious Regression/Avoidance: Detects and avoids worse outcomes (dodge bullet).
- Neutral but Informative: Well-powered experiments showing no effect (after proper power analysis and sample size planning).
"I felt it was sort of under celebrated ... all of the other types of wins that you can make besides finding something better." – Martin (04:22)
2. Defining and Measuring Learning: The Spotify Approach
- Team Effort: Developed by a cross-discipline central experimentation team and practicing data scientists.
- Neutral Outcomes: Seen as valuable only if experiment design was rigorous (pre-registered sample size, appropriate power calculations).
"Neutrality ... is informative because you can say, hey, maybe this is not worth pursuing because we actually ran a proper experiment..." – Martin (06:32)
3. Cultural Shifts: Moving From “Win Rate” to “Learning Rate”
- Spotify’s culture already prized avoiding mistakes, so the shift wasn't contentious for the “dodge bullet” scenario.
- More discussion was needed around classifying “neutral” outcomes and setting standards for rigor.
- Advocacy for open discussion within the analytics community about refining definitions of learning.
4. Preventing Metric Gaming & Avoiding “Neutral” as a Crutch
- Concerns discussed: Analysts running trivial tests to game learning rate metrics.
- Spotify watches for healthy distributions across wins, regressions, and neutral results; high rates of neutral outcomes prompt strategic reviews.
"If we're finding a ton of neutral results … maybe we're hitting diminishing returns and we should try something different." – Martin (12:25)
5. Dealing with Invalid Experiments
- Invalid experiments (e.g., technical issues, setup errors) are monitored closely and kept as close to zero as possible.
- Improving tooling and processes is ongoing, especially for complex experiment environments (multiple devices, platforms).
6. Multiple Metrics, Guardrails, and Sample Size Complexity
- Success Metrics: What you want to improve.
- Guardrail Metrics: What must not get worse.
- Adjustments for multiple metrics (success and guardrails) are made at the statistical decision rule level.
"...at least one of the success metrics should have improved and none of the guardrail metrics should have been harmed." – Martin (23:01)
- Adequately powering all relevant metrics is computationally and statistically challenging, especially with low-frequency outcomes further down the funnel.
7. Culture & Product Team Collaboration
- Cultural strategies include keeping the experimentation tool opinionated and user-friendly while loading robust educational programs.
- Product teams decide on metric frameworks through dialogues tailored to each team or project's maturity and goals.
"...we are sort of cultivating what the teams ... are thinking about this." – Martin (27:00)
8. Fishing Expeditions and Avoiding False Learnings
- The team discusses temptation and pitfalls of exploring every metric ("fishing expedition"), risking false positives and overfitting.
"If you go on a fishing expedition, your false positive rate can go way up because you detect noise as a signal." – Tim (29:10)
- The proper response is to generate hypotheses based on plausible theory, then validate with replication or follow-up experiments.
"One true possibility here is to replicate ... if I can repeat it, then I will ship it." – Martin (34:09)
9. Big Swings vs. Small Optimizations: The Intent/Optimize Matrix
- Teams often conflate “identifying” if something matters and “optimizing” for it.
- Martin advocates for “maximum viable” changes when unsure of a feature’s impact, to ensure any real effect is detectable.
"If you don't like the neutral result, it means that the question you posed wasn't interesting enough." – Martin (39:44)
- Take "big swings" before spending resources on incremental optimization.
10. Operationalizing Neutral Results: To Ship or Not to Ship?
- If an experiment is planned with the understanding that neutrality is acceptable (e.g., infrastructure changes), proceed if guardrail metrics confirm no harm.
"...as long as you've decided before you run the experiment that you're going to ship if it's neutral, I'm all good with it." – Martin (49:50)
11. Education & Tooling at Scale
- Spotify invests heavily in educational materials, boot camps, opinionated tooling, and onboarding processes to make experimentation accessible and systematic across hundreds of teams.
"...infiltrated the whole organization with experimentation onboarding and materials and that has helped." – Martin (54:40)
Notable Quotes & Memorable Moments
-
On moving past win rate:
"...all of the other types of wins that you can make besides finding something that was better than the current version ... doesn't really reflect how most companies are actually using experimentation." – Martin (03:45)
-
On being culturally open to questioning metrics:
"There's always people questioning everything at Spotify, which is one of the things that I love..." – Martin (08:01)
-
On replication and exploration:
"If you have a streamlined...way to run experiments ... one true possibility here is to replicate." – Martin (34:16)
-
On the importance of big swings:
"A big swing with a neutral result feels like it has a lot more merit than a little small tap with a neutral result." – Tim (39:37)
-
On managing neutrality and shipping:
"We are planning to ship this as long as it's not bad...by using the rollout, you're declaring your intent..." – Martin (51:15)
Important Timestamps
- [02:05] – Introduction and context: moving from win rate to learning rate.
- [04:22] – Drawbacks of “win rate” and Spotify’s expanded learning framework.
- [06:32] – Defining neutral but valuable learning in experiment outcomes.
- [08:01] – Cultural adaptation at Spotify and cross-team collaboration.
- [12:25] – Using learning rate distributions to identify diminishing returns.
- [15:56] – Customizing experimentation metrics for different product teams.
- [19:12] – Sample size, power analysis, and ongoing experiment evaluation.
- [23:01] – Handling multiple metrics: success vs guardrail.
- [25:36] – Culture and the challenge of metric selection in experiments.
- [29:10] – Caution on fishing for effects – risk of false positives.
- [34:09] – Replication as defense against accidental “false learning.”
- [39:44] – The need for interesting, bold experiment questions.
- [49:50] – Shipping neutral results: importance of setting intent pre-experiment.
- [53:14] – Scaling education and onboarding for experimentation at Spotify.
- [55:22] – How platform design and education work together at Spotify.
Flow and Tone
- The discussion is engaging, pragmatic, and packed with real-world context, reflecting the analytic rigor and cultural openness typical of both Spotify and the podcast hosts.
- Martin Schulzberg’s explanations are technical yet relatable, often punctuated by candid humor between he and the hosts.
- The conversation maintains a collaborative and inquisitive tone, frequently inviting audience exploration and feedback on evolving concepts of measurement and learning in analytics.
Resources and Further Reading
- Spotify's "Beyond Win Rate" Experimentation Framework Article (link referenced in episode's show notes)
- Three Blue One Brown YouTube Channel – Recommended by Martin for visual math explanations, especially for ML/AI topics. ([57:00])
- Medium: "Escaping the AI: Why MVPs Should Be Delightful" by James Skinner – Mentioned by Val (58:33)
- Katie Bauer’s "The Next Data Bottleneck" on Substack – Recommended by Tim (59:47)
Summing Up
This episode offers a nuanced, inside look at how Spotify and leading analytics practitioners are raising the bar for what it means to “learn” from experimentation. Rather than being tied to simple win/loss metrics, teams are encouraged to think strategically about all outcomes, the value of neutrality, and the absolute requirement for rigor, intent, and thoughtful culture at every stage.
For those involved in product analytics, experimentation, or building data-driven cultures, this episode is full of practical insights—and a reminder that learning is always at the heart of progress.
