AWS Bites: Episode 152 — Exploring Lambda Durable Functions
Episode Date: February 6, 2026
Hosts: Eoin Shanaghy & Luciano Mammino
Overview
This episode of AWS Bites dives deep into AWS Lambda Durable Functions, a major new feature announced at re:Invent 2025. The hosts discuss what "durable" functions actually mean for Lambda, how the new patterns work, key use cases, and their firsthand experience rebuilding an open source app to take advantage of the new capabilities. They also address sticky points—like the deterministic execution model, debugging gotchas, comparison with Step Functions, cost considerations, and alternatives outside AWS.
Key Topics & Discussion Points
1. What Are Lambda Durable Functions?
[00:00–09:10]
- Lambda Durable Functions add orchestration and persistent state to regular Lambdas.
- Key enhancements include durable workflows, checkpointing, suspending/waiting, and resuming execution.
- They're opt-in: “There is a new flag that you can turn on to basically turn a regular lambda function into a durable lambda function, and that [...] opts in the function into what is called the durable execution engine […] through a dedicated SDK.” — Luciano, [02:40]
- Introduces a mental shift: Develop as a sequence of atomic steps, not just single invocations.
- Durable executions can suspend for planned (e.g., timers, callbacks, human approvals) or unplanned (errors, timeouts) reasons, resuming later from the last checkpointed state.
- Max single execution is still 15 minutes, but the entire workflow can last up to one year.
Memorable Quote:
“It’s still lambda. It still has the same runtime, the same scaling, but with a framework that can now checkpoint progress, suspend execution... and resume later from a safe point, skipping the work you already completed.” — Eoin, [00:57]
2. Durable Functions vs. Standard Lambda & Key Use Cases
[09:10–15:47]
- Compared to raw Lambdas, Durable Functions eliminate the pain of rolling your own orchestration with queues, event triggers, and context handoff between invocations.
- Checkpoints mean that completed steps aren’t redone after resume.
- Step-level resilience: Retry, backoff, and error handling all possible as first-class features.
- Common use cases:
- Human-in-the-loop workflows (approvals, callbacks)
- Order processing (as per Yan Cui’s newsletter), with multi-stage flows and restaurant confirmations
- Multi-step onboarding flows, complex payment retry logic, media processing workflows
Notable Example:
“...you can build an order processing workflow for a food delivery service… durable lambda function which implements the following workflow in steps: save the order, broadcast to EventBridge, wait for a restaurant confirmation, potentially use a callback for human approval, handle timeouts, then track order progress, feedback, etc.” — Luciano, [11:30]
3. Developer Experience: How Do Durable Functions Work?
[15:47–19:26]
- Written as regular Lambdas but wrapped with the Durable Functions SDK (
withDurableExecution). - Available for JavaScript/TypeScript and Python (others, like Java and Rust, are in progress).
- Instead of writing everything in the handler, define named atomic steps (e.g.,
context.step('step1', ...)) and let the platform handle checkpointing. - Explicit constructs for waiting: e.g., “wait for N minutes” does not burn runtime or cost.
Key Point:
“… a function execution spans multiple invocations, but it still feels like one flow, even though it seems like a single sequential thing under the hood.” — Eoin, [17:20]
4. Under the Hood: Checkpointing & the Determinism Model
[19:26–24:58]
- Each time a step is completed, its result is checkpointed.
- On resumption: Execution always restarts from the beginning, re-executing code up to the step not yet completed, but skipping actual execution for already-completed steps by reading their persisted results.
- Crucial best practice: Only logic inside steps is durable/deterministic. Anything random or time-based outside of steps (the “orchestrator path”) can cause bugs.
Memorable Moment:
“If there’s one thing you should take away...: when the lambda resumes... it always starts to execute your code from the beginning... every time a step is encountered, the SDK checks if it’s completed... if it did, just takes the value from state.” — Luciano, [21:00]
5. Ecosystem, SDKs & Developer Tools
[24:58–26:42]
- TypeScript SDK is mature, supports mocking/testing/local runs.
- Community tools (like
middyand Lambda Powertools) are aligning quickly to support durable features. - Minor SDK and Console bugs exist—expected to improve.
- The model doesn’t require runtime changes, just SDKs, so language support is likely to expand quickly.
6. Real-World Example: Rebuilding PodWhisperer with Durable Functions
[26:42–37:44]
- PodWhisperer is their open source podcast transcription workflow.
- Originally used both OpenAI Whisper and Amazon Transcribe; now shifting to WhisperX for better diarization and word-level timestamps.
- Steps in the new workflow:
- File upload triggers workflow via EventBridge.
- Step triggers file processing in ECS Managed Cluster with GPU, pausing the durable execution until a callback.
- Replacement rules and regex cleaning.
- Bedrock LLM prompt for further transcript refinement, correcting terms and identifying speakers.
- Segment normalization with word-level timestamps for readable subtitle chunks.
- Caption generation (SRT/VTT/JSON formats).
- Final EventBridge event signals completion.
- Episoder (another open source tool) automates post-processing and publishing.
- Durable Functions made the complex multi-step process simpler and more reliable.
Timestamped Narrative:
“We drop a file into a stream, that creates an EventBridge event... starts the durable execution. First thing: send an event into SQS... which triggers ECS cluster for transcription. After that, execution pauses waiting for a callback…” — Luciano, [31:35]
7. Gotchas, Best Practices, & Common Pitfalls
[37:44–39:17]
- Don’t put non-deterministic code outside steps! (“random”, timestamps, UUIDs). Persisted state inside steps will be restored on resume, anything else may behave unpredictably.
- Design for idempotency.
- Use context-provided, replay-aware logging.
- Generative AI tools don’t understand these rules and may suggest unsafe patterns.
- Example: measuring total execution time needs to be calculated within a step.
8. Pricing
[39:17–43:20]
- Standard Lambda pricing applies.
- Extra cost for durable operations:
- $8 per million durable operations (checkpoints, steps, waits, callbacks).
- $0.25 per GB persisted data.
- $0.15 per GB/month for data retention.
- Retention period is configurable.
9. Comparisons: Step Functions & Alternatives
[43:20–47:00]
- Durable Functions vs. Step Functions:
- Durable Functions: Use code/SDK—not ASL; better for complex, code-first workflows.
- Step Functions: Visual workflow builder, better AWS integrations, and good for non-code orchestration.
- Durable Functions handle all Lambda event sources, Step Functions have more limited triggers.
- Durable Functions testing is easier and more ergonomic in code.
- Step Functions better for massive parallelism (distributable map jobs).
- Visualization: Step Functions has built-in flow view; you’d need to document flow in Durable Functions yourself.
Quote:
“Durable functions are just Lambda functions… One of the advantages is you can use any event source… [with] step functions you don’t have the same set of supported integrations.” — Eoin, [43:37]
- Alternatives:
- dbos (open source, runs on Postgres, multiple languages),
- temporal.io – popular orchestration-as-a-service,
- trigger.dev – code-based, SaaS-like orchestration
- Durable pattern is not new, but now has native AWS support.
Notable Quotes & Moments
- Eoin ([00:57]): "It still has the same runtime, the same scaling, but with a framework that can now checkpoint progress, suspend execution when you need to wait, can resume later from a safe point, skipping the work you already completed."
- Luciano ([21:00]): "Every time a step is encountered again... [the SDK] will check, 'Did I already complete this?'...if it did, not going to re-execute, just takes the value from state."
- Eoin ([43:37]): "Durable functions are just lambda functions, right?... you can use any event source that works with lambda..."
- Luciano ([45:29]): "You don't get a visualization built in... for durable functions that's something you need to do yourself..."
Quick Best-Practice Recap
- Put all side-effectful, random, or time-based logic inside steps.
- Design for idempotency and clear step boundaries.
- Use replay-aware logging.
- Be cautious with AI-generated code—it may miss critical determinism requirements.
- Durable Functions are best for orchestrating multi-step, code-centric workflows, especially those featuring human approval, timeouts, and external callbacks.
Final Thoughts
Durable Functions are a compelling new middle ground between classic Lambda and full-blown orchestration, with big benefits for complex, code-driven AWS workflows. They're code-first, event-driven, and support long-lived executions (up to a year), with built-in checkpointing, suspension, and resume—all while only paying for what you use.
Let the hosts know: What’s your experience with Lambda Durable Functions? What worked, what didn't, and where do they fit best over Step Functions?
Important Timestamps
- [00:00] — Intro, Why Lambda needs "superpowers"
- [02:28] — What makes a Lambda "durable"?
- [11:09] — Use cases: Food delivery order workflow, media processing
- [15:47] — Developer experience & SDKs
- [19:26] — Under the hood: Checkpointing, determinism, and replay model
- [26:42] — Real-world app: PodWhisperer and media transcription pipeline
- [37:44] — Best practices & common mistakes
- [39:17] — Pricing
- [43:20] — Step Functions vs Durable Functions
- [45:05] — Visualization, parallelism, and fit for AI workflow
Full episode resources, code examples, and more at the AWS Bites website.
