Podcast Summary
Podcast: Scrum Master Toolbox Podcast
Episode: When AI Decisions Go Wrong at Scale—And How to Prevent It With Ran Aroussi
Host: Vasco Duarte
Guest: Ran Aroussi
Date: February 16, 2026
Episode Overview
This special bonus episode explores a crucial and timely topic: how to ensure transparency, observability, and governance in AI systems—especially as these systems become more autonomous and the scale (and risks) of wrong decisions grows. Guest Ran Aroussi, founder of Muxi and author of Production Grade Agentic AI, shares hard-won lessons and actionable advice for keeping AI aligned with human goals in real-world production contexts.
Key Discussion Points & Insights
1. Ran Aroussi’s Origin Story and Perspectives on AI at Scale
- Ran’s Background: Spanning 35+ years, from programming as a teenager to building tools for finance and edtech, and now focusing on production-ready AI agents.
- “I started when I was 13... all through the EdTech space...writing algorithms for online media exchanges... I've created why Finance, which is the library to download stock information.” (03:37)
- Why AI is a Different Challenge: The leap from cool AI demos to real, large-scale production deployment is massive.
- “I've watched well-designed agents make perfectly reasonable decisions... but in a context where the decision was catastrophically wrong, and there was really no way of knowing what had happened until the damage was already there.” (07:30)
2. The Heart of the Problem: Non-Determinism and Unpredictability
- AI’s Non-deterministic Nature: An AI doesn’t behave like traditional, deterministic software—even repeated inputs can yield subtly different outputs.
- “With AI... your software expected a specific output format, then it’s not working anymore.” (11:37)
- User Input Complexity: Real users introduce infinite variability and edge cases that can’t be anticipated in testing.
- “When you deploy a system in production, you have users with... unimaginary different use cases and different problems...the AI suddenly start bringing in completely different answers and behaves a lot differently.” (10:03)
- Testing Limits: The language-driven ‘input space’ is essentially infinite, making exhaustive pre-release testing impossible.
- “The input space for AI systems is practically infinite because it's language, you can never test it.” (12:38)
3. Layers of Safety: Observability, Guardrails, and Logging
Guardrails are Essential but Not Enough
- Guardrails are Only One Layer:
- “Guardrails... It has to be a multi layered system that protects output.” (13:18)
- “First off, you need to... lock the version of the model that you're working with... and then you also need to put in the guardrails.” (13:20)
- Deterministic Filters and Firewalls: Use additional checks that automatically block inappropriate or dangerous outputs.
- “The simplest example is like those profanity filters that we used to have on, on forms... to block personal, identify information from coming back to the user.” (15:32)
- Observability & Logging: Deep observability is needed—capture what was decided, by what workflow, with what tools, so that patterns, failures, and risks can be analyzed and audited.
- “Finally the most important thing is that you have detailed logs on what had happened... Not to mention if...your agents are using A2A and communicate with completely external agents as well, that's a lot of things that you at least should be able to trace back.” (15:58)
Example of Multi-layered Defense
- Lock to specific, tested AI model versions
- Use AI to generate massive test cases
- Implement traditional deterministic “firewalls” for data sanitization
- Build robust, structured logging and analytics for real-time and post-mortem evaluation
4. The Reality of Observability in Agentic AI (18:17–25:03)
- Classic Observability Tools Only Go So Far: Traditional monitoring/logging is inadequate for the complex, branching workflows and agent-to-agent interactions of modern AI.
- “There are a lot of tools that are doing that... but they're better at LLM, at CHAT and at conversational observability rather than agentic observability.” (20:24)
- Multiple Perspectives Required: Engineers need deep, raw traces; managers need sanitized summaries; all need to be able to “replay” surprising outcomes to find blind spots.
- “...engineers need to have access to full level of data for full debugging...then you have to have more of a managerial where there's no personal information, you just show flow based on topic.” (21:23)
- Dynamic, Personalized Workflows: Even identical user prompts can produce different workflows depending on context, prior preferences, and available tools.
- “It’s a dynamic workflow that just got built based on who is talking to the system and their preferences and their history and the tools and everything.” (24:40)
5. Governance: It’s About Human Alignment, Not Control (25:19–33:00)
- Governance as “Human-in-the-Loop”: For critical actions, like high-value financial transfers, humans must approve or review AI-suggested actions—but not be overwhelmed by micro-management.
- “Governance isn’t about control, it’s about keeping people in the loop.” – Vasco Duarte (25:19)
- “I still wouldn't be comfortable to approve a transfer to AI and have it send the money... At most I will allow it to... just put in the order, let me go and I will check that all the transfers are correct and I'll just have to say yes, yes, yes.” – Ran Aroussi (27:12)
- Appropriate Automation Level: Certain routine operations can be delegated to AI; risky/irreversible actions require explicit human oversight.
- Trust is Built Over Time: AI systems, like co-workers, will earn more autonomy as reliability is demonstrated.
- “You have to work with them, you have to make sure that their performance is on par...” (33:00)
Notable Quotes & Memorable Moments
- “When AI systems make decisions no one can explain at scale... that’s far more dangerous if we get it wrong.” – Vasco Duarte (01:11)
- “I've watched well-designed agents make perfectly reasonable decisions... but... catastrophically wrong and there was no way of knowing what happened until the damage was already there.” – Ran Aroussi (07:31)
- “With AI, the agents are quite capable... but I think that deploying it safely right now is more about deploying it reliably and about visibility and audibility. Especially if you are an enterprise.” – Ran Aroussi (08:13)
- “Guardrails... have to be a multi layered system...” – Ran Aroussi (13:18)
- “You need to at the very minimum build that trust in the performance and the result. And that wouldn't happen without starting with a large amount of human in the loop interaction.” – Ran Aroussi (33:54)
- “Our aspirations have a human like counterpart in the AI, but even humans, you don’t just meet someone and trust them completely.” – Ran Aroussi (33:00)
Timestamps for Key Segments
- [01:11] – Introduction of episode theme: AI transparency, alignment, risks of unexplainable at-scale decisions
- [03:37] – Ran Aroussi’s background and motivations
- [07:01] – The unique challenge of non-deterministic AI
- [10:03] – Common leader misunderstandings about AI predictability
- [13:18] – Multi-level safety: guardrails, model version locking, deterministic filtering
- [15:58] – Importance of deep observability and comprehensive logs
- [18:17] – Observing and reproducing agentic AI workflows, handling the “true infinite” input space
- [25:19] – Governance defined: human-in-the-loop, appropriate intervention points
- [27:12] – Example: human oversight for high-value transfers
- [31:41] – Trust and reliable autonomy: AI as a partner in processes
Resources & Recommendations
- Ran’s Book: Production Grade Agentic AI: From Brittle Workflows to Deployable Autonomous Systems
- Meta Recommendation: Study systems engineering and operational excellence, not just AI—recommended book: Thinking in Systems by Donella Meadows (36:06)
- Machine Learning Systems: Designing Machine Learning Systems (author to be provided; recommended by Ran) (36:21)
- Follow Ran on X (Twitter): @aroussi
Tone and Takeaway
The episode is conversational, candid, and technical, providing hard-earned, pragmatic insights about the real risks and requirements for safe AI deployment in the workplace. The message is not alarmist but is clearly cautionary and practical: real-world AI must be built for audibility, iterative improvement, and, above all, continued human involvement and oversight.
For more, check out the referenced resources and connect with Ran and Vasco in the show notes.
