AWS Bites Podcast Episode 142: “Escape from S3”

Hosts: Eoin Shanaghy & Luciano Mammino
Date: April 3, 2025

Episode Overview

This episode explores the practical and strategic challenges of migrating large amounts of data between Amazon S3 and S3-compatible storage solutions. Eoin and Luciano discuss why organizations might look to move data away from (or into) S3, the growing landscape of S3 alternatives, and their experience building an open-source CLI tool—S3Migrate—to solve complex, cross-provider data transfer needs when standard solutions fall short.

Key Discussion Points & Insights

1. Why “Escape from S3”? (00:00)

Business motivation:
- Migration scenarios range from moving to AWS (from an S3-compatible service) to "escaping" AWS for cost or compliance reasons, or simply moving data between buckets in different AWS accounts, regions, or cloud providers.
- Increasingly competitive object-storage offerings are attracting businesses seeking cost savings, despite added complexity.

“There are more and more S3 compatible alternative storage services and some of them are actually becoming really, really competitive on pricing... this is actually something that can be an effective strategy to save some cost.”
—Luciano (00:40)

Non-trivial use case:
- Copying between accounts or providers is not as easy as S3-to-S3 with common credentials; cross-provider and cross-account permissions add complexity.

2. S3 and the Competitive Object Storage Landscape (02:01)

Market pricing comparisons:
- S3 Standard: ~$23/TB/month (after 5GB free tier and up to 100GB free egress)
- DigitalOcean Spaces: $5/month fixed plus $20/TB/month, 250GB free tier
- Cloudflare R2: $15/TB/month, zero egress fees
- Backblaze B2: $6/TB/month (cheapest), Wasabi: $7/TB/month
- Linode (Akamai): $20/TB/month with 1TB free egress
- Minio (self-hosted): Managed cloud at $96,000/year for 400TB

“It seems like the market leader is keen to make it difficult for people to do egress, but the new entrants are very keen to say that they want to make that as cheap as possible.”
—Eoin (02:51)

High-profile departures:
- Reference to DHH's (David Heinemeier Hansson from Basecamp/HEY) hot-take article about leaving S3 for self-hosted storage; compares to Dropbox's earlier move.

3. Why Existing Tools (Like AWS CLI) Fall Short (08:56)

AWS-origin solutions (CLI, batch operations):
- Built for AWS-only scenarios.
- Cross-account copies require complex permission setups and a single credential set.
Multi-provider or source/destination credentials not supported:
- No turnkey tool for direct S3-to-alternate/object storage migration with proper fault tolerance, prioritization, resumability, or support for two sets of credentials.

4. S3Migrate: Building an Open-Source Data Migration Tool (11:19)

Motivation:
- Needed to move ~10TB of millions of small files efficiently, using separate credentials for source and destination, prioritizing recent files, and enabling resumable transfers.
Key features:
- Separate credentials for source & destination
- Catalog phase: Lists & stores objects/metadata in a local SQLite file.
- Copy phase: Uses the catalog for resumable, ordered, tracked copying.

“Effectively it's called S3Migrate. It tries to do something somewhat similar to AWS s3 sync, but allows you to provide two separate sets of credentials. This is probably the main difference...”
—Luciano (11:22)

Technical stack:
- Node.js / TypeScript
- CMDER JS for CLI parsing
- SQLite for state storage (“mini state file” for resumability, prioritization)
- AWS SDK v3 for S3 operations
Rationale for technology choice:
- Used Node.js for speed/experience, network I/O is the bottleneck, not CPU.
- Open source, easy to try with npx; set credentials via ENV vars or .env files.
Workflow:
1. Catalog: s3migrate catalog — Scans and stores file metadata locally.
2. Copy: s3migrate copy — Copies outstanding files, tracks completion.
3. Resumability: Failure or interruption resumes from last state using SQLite.
Credential handling:
- Source and destination access keys and endpoints are specified independently, via ENV vars or .env.
Limitation:
- Needs a persistent host (laptop/server) to run for the operation’s duration.

5. Performance Optimizations & Challenges (17:38)

I/O and memory optimization:
- Uses Node.js streams and S3’s streaming APIs for efficient, chunked transfers (no excess buffering).

“When you run a get object command... the body that you receive in the response is a Node.js stream... you are not eagerly consuming that data... you can pipe [read/write] them together.”
—Luciano (18:19)

Backpressure handling:
- Node.js manages reading/writing speed mismatch with automatic buffering control.
Chunk size and concurrency:
- Chunk size can be tuned (tradeoff between API call count and memory use).
- Supports concurrent copy operations (single-threaded concurrency in Node.js); can be parallelized across catalogs and machines.
- Supports prefix-based catalog partitions for parallel distributed transfers.

6. Roadmap & Future Work (24:14)

By design not supported:

Object attributes beyond basic data (tags, ACLs, storage classes, etc.)
Encryption (not tested)

Wishlist:

Multipart upload support for very large files
- Luciano calls for PRs (pull requests) from listeners:

“If anyone is open, maybe you are using this tool and you find it useful. It's open source, so feel free to send a PR. This is one feature that we would love to see.”
—Luciano (25:55)

7. Alternative Solutions (26:54)

Open-source:
- AWS Labs tool (Go, AWS only, uses batch operations)
- S3S3 Mirror (Java, unmaintained)
- Knox Copy (Ruby, deprecated)
Multi-protocol:
- Rclone (supports many providers including S3, Dropbox, Google Drive)
- Flexify (paid service, recommended by DigitalOcean)
Filesystem abstraction:
- Mountpoint for S3, S3 FUSE

“I'm always kind of interested but skeptical about solutions that try to map object storage into a file system abstraction. But Mount Point does work well for some cases...”
—Eoin (28:21)

8. Looking Ahead / Community Call

Expect more migration tooling from alternative providers (e.g. “one-click” imports).
Hosts invite the audience to share their stories, tooling, or solutions for S3 migrations.

“We are really curious to hear from you. Have you dealt with this kind of problem? Don't be shy. Let us know because we are always eager to learn from you and share our experience.”
—Luciano (29:31)

Notable Quotes & Memorable Moments

On provider motivations:
- “It seems like the market leader is keen to make it difficult for people to do egress, but the new entrants are very keen to say that they want to make that as cheap as possible.” —Eoin (02:51)
On custom tooling:
- “We are big fans of AWS and S3... but sometimes business requirements can get in the way and you end up in unexpected places.” —Luciano (05:47)
On streaming for efficiency:
- “Node.js has a mechanism called back pressure handling… all of that stuff happens automatically when you stream.” —Luciano (19:00)
On roadblocks to generic migration:
- "Trying to be comprehensive and support all of these things, I think you easily end up with... a matrix of what is supported and not supported." —Luciano (24:38)
On community involvement:
- “It would be great to get more development on this…” —Eoin (26:54)
On anticipated provider tools:
- “I wouldn't be too surprised to be honest if they do because... it's in their best interest.” —Luciano (29:09)

Key Timestamps

00:00 – Introduction: Why copy/escape from S3?
02:01 – Market/pricing comparison of S3-compatible storage
05:35 – The real-world migration scenario and challenge
08:56 – Shortcomings of AWS CLI & batch solutions for hybrid/cross-provider
11:19 – Why & how S3Migrate was built
17:38 – Performance, chunking, concurrency, and parallelism details
24:14 – Limitations & future improvements for S3Migrate
26:54 – Other tools & the state of open-source solutions
29:09 – Will providers offer seamless “import from S3”?
29:31 – Audience call/episode wrap-up

Conclusion

This episode provides a practical, detailed look at the nuances of migrating data out of—or into—S3 and other object storages. The hosts’ open-source project, S3Migrate, fills a notable gap for those needing more flexibility than standard AWS tools provide. They encourage contributions, experience sharing, and further community discussion as S3-compatible migration becomes increasingly important for cloud professionals.

AWS Bites Podcast Episode 142: “Escape from S3”

Hosts: Eoin Shanaghy & Luciano Mammino
Date: April 3, 2025

Episode Overview

Key Discussion Points & Insights

1. Why “Escape from S3”? (00:00)

Business motivation:
- Migration scenarios range from moving to AWS (from an S3-compatible service) to "escaping" AWS for cost or compliance reasons, or simply moving data between buckets in different AWS accounts, regions, or cloud providers.
- Increasingly competitive object-storage offerings are attracting businesses seeking cost savings, despite added complexity.

“There are more and more S3 compatible alternative storage services and some of them are actually becoming really, really competitive on pricing... this is actually something that can be an effective strategy to save some cost.”
—Luciano (00:40)

Non-trivial use case:
- Copying between accounts or providers is not as easy as S3-to-S3 with common credentials; cross-provider and cross-account permissions add complexity.

2. S3 and the Competitive Object Storage Landscape (02:01)

Market pricing comparisons:
- S3 Standard: ~$23/TB/month (after 5GB free tier and up to 100GB free egress)
- DigitalOcean Spaces: $5/month fixed plus $20/TB/month, 250GB free tier
- Cloudflare R2: $15/TB/month, zero egress fees
- Backblaze B2: $6/TB/month (cheapest), Wasabi: $7/TB/month
- Linode (Akamai): $20/TB/month with 1TB free egress
- Minio (self-hosted): Managed cloud at $96,000/year for 400TB

“It seems like the market leader is keen to make it difficult for people to do egress, but the new entrants are very keen to say that they want to make that as cheap as possible.”
—Eoin (02:51)

High-profile departures:
- Reference to DHH's (David Heinemeier Hansson from Basecamp/HEY) hot-take article about leaving S3 for self-hosted storage; compares to Dropbox's earlier move.

3. Why Existing Tools (Like AWS CLI) Fall Short (08:56)

AWS-origin solutions (CLI, batch operations):
- Built for AWS-only scenarios.
- Cross-account copies require complex permission setups and a single credential set.
Multi-provider or source/destination credentials not supported:
- No turnkey tool for direct S3-to-alternate/object storage migration with proper fault tolerance, prioritization, resumability, or support for two sets of credentials.

4. S3Migrate: Building an Open-Source Data Migration Tool (11:19)

Motivation:
- Needed to move ~10TB of millions of small files efficiently, using separate credentials for source and destination, prioritizing recent files, and enabling resumable transfers.
Key features:
- Separate credentials for source & destination
- Catalog phase: Lists & stores objects/metadata in a local SQLite file.
- Copy phase: Uses the catalog for resumable, ordered, tracked copying.

“Effectively it's called S3Migrate. It tries to do something somewhat similar to AWS s3 sync, but allows you to provide two separate sets of credentials. This is probably the main difference...”
—Luciano (11:22)

Technical stack:
- Node.js / TypeScript
- CMDER JS for CLI parsing
- SQLite for state storage (“mini state file” for resumability, prioritization)
- AWS SDK v3 for S3 operations
Rationale for technology choice:
- Used Node.js for speed/experience, network I/O is the bottleneck, not CPU.
- Open source, easy to try with npx; set credentials via ENV vars or .env files.
Workflow:
1. Catalog: s3migrate catalog — Scans and stores file metadata locally.
2. Copy: s3migrate copy — Copies outstanding files, tracks completion.
3. Resumability: Failure or interruption resumes from last state using SQLite.
Credential handling:
- Source and destination access keys and endpoints are specified independently, via ENV vars or .env.
Limitation:
- Needs a persistent host (laptop/server) to run for the operation’s duration.

5. Performance Optimizations & Challenges (17:38)

I/O and memory optimization:
- Uses Node.js streams and S3’s streaming APIs for efficient, chunked transfers (no excess buffering).

“When you run a get object command... the body that you receive in the response is a Node.js stream... you are not eagerly consuming that data... you can pipe [read/write] them together.”
—Luciano (18:19)

Backpressure handling:
- Node.js manages reading/writing speed mismatch with automatic buffering control.
Chunk size and concurrency:
- Chunk size can be tuned (tradeoff between API call count and memory use).
- Supports concurrent copy operations (single-threaded concurrency in Node.js); can be parallelized across catalogs and machines.
- Supports prefix-based catalog partitions for parallel distributed transfers.

6. Roadmap & Future Work (24:14)

By design not supported:

Object attributes beyond basic data (tags, ACLs, storage classes, etc.)
Encryption (not tested)

Wishlist:

Multipart upload support for very large files
- Luciano calls for PRs (pull requests) from listeners:

“If anyone is open, maybe you are using this tool and you find it useful. It's open source, so feel free to send a PR. This is one feature that we would love to see.”
—Luciano (25:55)

7. Alternative Solutions (26:54)

Open-source:
- AWS Labs tool (Go, AWS only, uses batch operations)
- S3S3 Mirror (Java, unmaintained)
- Knox Copy (Ruby, deprecated)
Multi-protocol:
- Rclone (supports many providers including S3, Dropbox, Google Drive)
- Flexify (paid service, recommended by DigitalOcean)
Filesystem abstraction:
- Mountpoint for S3, S3 FUSE

“I'm always kind of interested but skeptical about solutions that try to map object storage into a file system abstraction. But Mount Point does work well for some cases...”
—Eoin (28:21)

8. Looking Ahead / Community Call

Expect more migration tooling from alternative providers (e.g. “one-click” imports).
Hosts invite the audience to share their stories, tooling, or solutions for S3 migrations.

“We are really curious to hear from you. Have you dealt with this kind of problem? Don't be shy. Let us know because we are always eager to learn from you and share our experience.”
—Luciano (29:31)

Notable Quotes & Memorable Moments

On provider motivations:
- “It seems like the market leader is keen to make it difficult for people to do egress, but the new entrants are very keen to say that they want to make that as cheap as possible.” —Eoin (02:51)
On custom tooling:
- “We are big fans of AWS and S3... but sometimes business requirements can get in the way and you end up in unexpected places.” —Luciano (05:47)
On streaming for efficiency:
- “Node.js has a mechanism called back pressure handling… all of that stuff happens automatically when you stream.” —Luciano (19:00)
On roadblocks to generic migration:
- "Trying to be comprehensive and support all of these things, I think you easily end up with... a matrix of what is supported and not supported." —Luciano (24:38)
On community involvement:
- “It would be great to get more development on this…” —Eoin (26:54)
On anticipated provider tools:
- “I wouldn't be too surprised to be honest if they do because... it's in their best interest.” —Luciano (29:09)

Key Timestamps

00:00 – Introduction: Why copy/escape from S3?
02:01 – Market/pricing comparison of S3-compatible storage
05:35 – The real-world migration scenario and challenge
08:56 – Shortcomings of AWS CLI & batch solutions for hybrid/cross-provider
11:19 – Why & how S3Migrate was built
17:38 – Performance, chunking, concurrency, and parallelism details
24:14 – Limitations & future improvements for S3Migrate
26:54 – Other tools & the state of open-source solutions
29:09 – Will providers offer seamless “import from S3”?
29:31 – Audience call/episode wrap-up

wavePod

142. Escape from S3

Powered by Wave AI

Summary

AWS Bites Podcast Episode 142: “Escape from S3”

Episode Overview

Key Discussion Points & Insights

1. Why “Escape from S3”? (00:00)

2. S3 and the Competitive Object Storage Landscape (02:01)

3. Why Existing Tools (Like AWS CLI) Fall Short (08:56)

4. S3Migrate: Building an Open-Source Data Migration Tool (11:19)

5. Performance Optimizations & Challenges (17:38)

6. Roadmap & Future Work (24:14)

7. Alternative Solutions (26:54)

8. Looking Ahead / Community Call

Notable Quotes & Memorable Moments

Key Timestamps

Conclusion

Summary

AWS Bites Podcast Episode 142: “Escape from S3”

Episode Overview

Key Discussion Points & Insights

1. Why “Escape from S3”? (00:00)

2. S3 and the Competitive Object Storage Landscape (02:01)

3. Why Existing Tools (Like AWS CLI) Fall Short (08:56)

4. S3Migrate: Building an Open-Source Data Migration Tool (11:19)

5. Performance Optimizations & Challenges (17:38)

6. Roadmap & Future Work (24:14)

7. Alternative Solutions (26:54)

8. Looking Ahead / Community Call

Notable Quotes & Memorable Moments

Key Timestamps

Conclusion