AWS Bites – Episode 150: Exploring All-New ECS Managed Instances (MI) Mode
Date: November 28, 2025
Hosts: Eoin Shanaghy & Luciano Mammino
Episode Overview
In this milestone episode, Eoin and Luciano take a deep dive into the brand-new ECS Managed Instances (MI) mode, examining how it fits into the container orchestration spectrum on AWS. They explore how ECS MI promises to bridge the gap between the simplicity of Fargate and the control of running ECS on EC2, while addressing GPU support, cost, scaling, performance, and real-world use cases. The episode is a hands-on, experience-driven evaluation, including lessons learned, potential pitfalls, and candid reflections from two AWS veterans.
Key Discussion Points & Insights
Understanding the Container Hosting Spectrum
- Fargate is highly favored for its "serverless for containers" simplicity, offering fast and reliable container deployments with minimal infrastructure management. However, it lacks features like GPU support, advanced storage, and custom networking.
- Memorable quote (A, 00:00):
"Anytime that AWS gives us a nice way of doing less configuration, getting up and running quickly and reliably, we grab the opportunity."
- Memorable quote (A, 00:00):
- ECS on EC2 allows full instance specification and control, but demands heavy operational management: handling AMIs, patching, auto-scaling groups, and security.
- ECS Managed Instances lands "in the middle" by offloading infrastructure management to AWS while letting users granularly specify infrastructure requirements (CPU, memory, GPU, etc.).
- Memorable quote (B, 03:49):
"With this third option, basically you keep the flexibility of EC2, but without that management burden."
- Memorable quote (B, 03:49):
Cost Model & Pricing Nuances
- Pricing Structure:
- You pay regular EC2 per-second pricing, plus a 12% management fee (calculated on the on-demand instance price).
- Benefits of Compute Savings Plans/Reserved Instances still apply.
- Key Limitation: No Spot Instance support (yet), which restricts potential cost savings for some use cases.
- Quote (A, 05:18):
"ECSMI does not support spot instances... That's a pity, especially since we thought we'd be able to use this to get really, really cheap infrastructure."
- Quote (A, 05:18):
- Cost Effectiveness:
- The value proposition hinges on the time saved in ops management vs. the 12% premium.
- Existing advanced ECS EC2 users with effective scaling and reserved pricing may see less benefit.
- Benchmarking against current workloads and costs is necessary for decision-making.
Breaking Down ECS MI Terminology
Luciano provides a concise, plain-English glossary for newcomers and those needing a refresher (08:20):
- Cluster: Logical home for your ECS workloads, grouping infrastructure and tasks.
- Task Definition: Blueprint for a container workload, specifying resources and configs.
- Task: Running instance of a task definition.
- Service: Resource that ensures a defined number of tasks run and manages scaling/deployment.
- Capacity Provider: Abstracts how ECS gets compute; in ECSMI, this involves specifying "instance attributes" (vCPU, memory, GPU, etc.).
- Attributes: Granular filters (e.g., burstable instances, CPU type, accelerators) to help AWS select matching EC2 instances.
Getting Started: Step-by-Step Walkthrough
- Cluster & Networking: Create or use existing, configure VPC and security groups.
- IAM Roles:
- Instance profile (for EC2)
- Infrastructure role (for ECSMI to launch and manage instances)
- Capacity Provider:
- Use default or specify custom attributes for fine-grained control.
- Task Definitions:
- Mark compatibility as
MANAGED_INSTANCES. - Specify resources needed (CPU, memory, GPU if necessary).
- Mark compatibility as
- Container Images & ECR: Publish images and define environment variables.
- ECS Service:
- Attach capacity provider, set scaling (auto or desired count), and manage deployment.
- Note: In CDK, you must currently use the FargateService L2 construct, even for ECSMI, which is confusing.
- Deploy and Observe:
- Stack deploy triggers ECSMI to provision suitable EC2 infrastructure dynamically.
CDK Support & Real-World Example
- Host Example Repository:
- The team is releasing a CDK template for ECS MI, addressing a gap in official samples.
- The sample supports a GPU-requiring workload that scales to zero when idle, triggered via SQS.
- Two examples:
- Template Workload: Python container that pulls from a queue and processes messages.
- AI Workload: Uses OpenAI Whisper to transcribe podcast audio from S3, as potential improvement over SageMaker's current implementation.
- Memorable quote on documentation (B, 16:05):
"[There's] a little bit of a lack of documentation and proper examples, but we are confident this is just something that's going to improve... so hopefully we can help a little with this one example."
- CDK Pitfalls:
- Still requires use of Fargate L2 construct for ECSMI (confusing).
- Exposure to misleading, unused attributes (e.g., spot instance options that are actually ignored).
- Some learning curve due to "newness" and sparse docs.
GPU Support: A Game Changer
- ECSMI enables GPU support—Fargate does not. This was a major motivation for experimenting and prototyping a Whisper-based AI transcription service.
- Memorable exchange (A/B, 20:09):
A: "I think you've asked publicly on the podcast many times for AWS to add GPU support to Fargate..."
B: "And Lambda."
A: "Oh yeah, good luck with that."
- Memorable exchange (A/B, 20:09):
- Quota Note:
- GPUs require explicit quota increases from AWS—plan ahead to avoid launch delays.
- Performance Observations:
- Starting a single task took 3–4 minutes; further scaling/performance tests still needed.
Use Cases: Where ECS MI Shines
- Ideal Scenarios:
- Workloads needing advanced hardware (GPU, custom storage/networking)
- Users with Reserved Instances/Capacity Reservations who want simplified management
- Batch/queue-driven workloads that scale to zero (e.g., AI transcription, image processing)
- High performance computing (HPC) jobs—requires further benchmarking, but promising on paper
- Consulting scenarios needing repeatable, low-ops GPU compute
- Sample Pattern:
"When there is work to do and you need very specific infrastructure for that work to happen, you just define all of it, put the work as a message in a queue and you know that it's going to scale up when needed and scale to zero when all the work has been completed." (B, 23:31)
When to Avoid ECS MI
- Strong Isolation Needs:
- Fargate's Firecracker VMs provide better strong isolation; ECSMI runs multiple tasks per host EC2 instance, reducing isolation.
- Custom AMIs Required:
- ECSMI exclusively uses managed images (Amazon Bottlerocket), no custom AMI option.
- Host Access Needed:
- No SSH into the host; ECS exec is container-only, just like Fargate.
- Spot Instance-Driven Cost Savings:
- Not available—this is a clear drawback for many cost-sensitive workloads.
Memorable Quotes & Moments
- On managed vs self-managed trade-offs:
- "ECSMI is essentially trading some control and a margin on top of on-demand pricing for reduced operational work. And whether that trade is worth it really depends on your story and your constraints." (A, 07:45)
- On GPU quota requests:
- "They want to protect you from bill shock by making you ask explicitly for the really expensive instance types." (A, 20:54)
- On ECSMI's promise:
- "Overall we think this is a good addition to ECS. Suite pricing might be a bit of a downer, but that completely depends as always..." (A, 25:46)
Key Timestamps for Important Segments
- 00:00–05:18: ECS hosting spectrum, MI mode introduction, and pricing model breakdown
- 08:20–12:44: ECS/MI terminology deep dive—clusters, definitions, roles, capacity providers
- 12:44–16:05: Step-by-step setup, common pitfalls, and CDK support story
- 16:05–20:09: Example workloads, code sample, and learnings from implementation
- 20:09–21:32: GPU support, real-use case insights
- 21:32–24:47: Best use cases, scaling observations, and HPC discussion
- 24:47–end: Anti-patterns/when not to use ECSMI, closing thoughts on adoption and future hope for spot/documentation improvements
Final Reflections
The conversation was energetic, honest, and practical—highlighting both excitement and reservations around this new ECS capability. Eoin and Luciano see ECS Managed Instances as a worthy addition, especially as CDK support matures and if Spot Instances are added in the future. GPU support is real, performance is promising, and there's clear value for both AWS newbies and advanced users who want "just enough" control with minimal ops pain.
Call to action:
If you have experiences or questions about ECSMI, the hosts encourage listeners to reach out via comments or social media.
