Summary8 min read

AWS Bites Episode 154: S3 Files

Date: May 22, 2026
Hosts: Eoin Shanaghy & Luciano Mammino
Main Theme:
A comprehensive deep dive into AWS's new S3 Files service—a managed file system that exposes Amazon S3 storage as a network file system, promising familiar file system behaviors with S3 scalability and cost-effectiveness. Eoin and Luciano explore how S3 Files works, when it fits, comparisons with other solutions, practical setup advice, real project experiences, benchmarking findings, limitations, and cost considerations.

Episode Structure

[00:00] Introduction to S3 and the Motivation for S3 Files
[01:46] Why S3 Isn’t a File System
[05:22] Alternatives Used Before S3 Files
[07:16] How S3 Files Works
[09:49] Getting Started and Configuration
[13:42] Performance and Benchmarks
[19:48] Limitations, Caveats, and Real-world Experience
[27:52] Cost Breakdown
[29:50] Recap and Final Thoughts

1. Introduction and Context [00:00]

S3 is AWS's most used and beloved service due to its scalability and cost-effectiveness, but fundamentally it’s not a true file system.
S3 is a key-value object store, not a hierarchical file/folder system. This mismatch is problematic for cases where applications expect mounted drives, appendable files, or atomic operations.
The newly launched “S3 Files” aims to fill this gap by providing a managed file system on top of S3, retaining S3’s benefits but enabling classic file system behaviors.
Integrates with EC2, ECS, and Lambda.

Quote:

“S3 has got to be the most commonly used and most loved AWS service...but it’s not a file system.” — Eoin [00:00]

2. Why S3 Isn’t a File System [01:46]

S3 has no true concept of directories—folders are just key prefixes, not real directory abstractions.
No atomic moves or renames; these are essentially copy+delete operations, which can be slow for large files.
S3 objects are immutable; you can’t edit bytes in-place or append data (except via multipart upload, which doesn’t behave like standard file appends).
Listing objects is computationally expensive—there’s no metadata tree like a regular file system.
Access controls rely on IAM policies and ACLs, not POSIX-style permissions.
Performance depends heavily on how you spread your prefixes; hot prefixes can cause bottlenecks.

Quote:

“What looks like a folder is basically just a key prefix...if you do that with a very large file...it takes some time, while in a real file system it’s pretty much instantaneous.” — Luciano [01:46]

3. Previous Workarounds and Alternatives [05:22]

S3FS-FUSE: Open source user-space driver to mount S3 as a file system. Popular for development and simple exploration.
Python S3FS (via fsspec): Widely used in big data workloads.
Hadoop S3A: Hadoop-compatible file system abstraction over S3.
Mountpoint for S3: AWS’s official high-throughput file system adapter.
FSx for Lustre: High-performance compute file system with S3 backends.
Amazon File Cache: A cache layer (EC2 only) using S3/NFS/Lustre.
AWS Storage Gateway / S3 File Gateway: For on-premises to S3 translation, exposing data with SMB or NFS.

References to previous AWS Bites episodes for more detail: [Ep. 95] (mount options), [Ep. 124] (object storage).

4. How S3 Files Works [07:16]

S3 Files exposes an S3 bucket (or prefix) as a shared file system, accessible via NFS.
Can be mounted inside EC2, ECS, EKS, and Lambda (via EFS-like interface).
Files are streamed from S3 to the NFS mount; small files (default <128KB) are cached using an EFS backend for speed.
Users can configure which files are cached, cache expiration, and prefetch/import behavior for directories.
EFS acts as the cache engine under the hood, making the service familiar to anyone who’s set up EFS.
Quote:

“Files can be streamed from S3 to NFS mount, but in some cases the files can also be cached, and this is by default the case for smaller files.” — Luciano [07:16]

5. Getting Started: Configuring S3 Files [09:49]

Required Components:

S3 bucket
S3 Files file system resource (linked to S3 bucket/prefix and an IAM role)
- Expiration rules: How long cache objects are kept (default: 24 hours)
- Import rules: Which files/directories are imported/cached and size limits (default: 128KB)
Mount Targets: Needed for each AZ, specifying subnets/security groups + IPv4/v6.
Access Points: Optional, to set directory roots and POSIX IDs.
IAM Permissions: File system policy must trust elasticfilesystem.amazonaws.com, and have EventBridge/S3 access.
Networking: No public endpoint; access is via VPC/subnet. Lambda and ECS require proper subnet/Security Group/NFS setup.

Quote:

“If you're used to AWS, it's a relatively simple one. It's not the most complex AWS service setup we've seen. And it's familiar.” — Eoin [12:59]

6. Performance Insights and Benchmarks [13:42]

AWS Documented Limits:

File systems support up to 5 GB/s write throughput, multiple TBs/s read,
Max per-client read throughput: 3 GB/s
Up to 250,000 read IOPS, 50,000 write IOPS
Latency:
- Reads from S3: tens of ms
- Reads from cache: sub-ms
- Writes: single-digit ms (staged on cache before S3 sync)

Real-World Testing (benchmarking):

Small file writes: 10–20% faster than S3’s put-object.
Small file reads: 5–10x faster (when cached).
Large file reads: Much faster with S3 Files than EFS, possibly from streaming/paralleling.
Large file writes: Similar across all tested options.
For Lambda/Fargate: Memory allocation is key; low-memory configs (e.g., ≤512MB) throttled network/I/O performance. 2GB+ yields much better results.
S3 Files strongest in mixed/medium file size cases, offering broad appeal.
Quote:

“Small file writes are generally slightly faster than S3. Put object by about 10 to 20%... The reads are generally dramatically faster than S3 directly, which you would expect, you know, for cached data like 5x to 10x faster.” — Eoin [16:09]

7. Limitations, Caveats, and Real-world Experiences [19:48]

Limitations:

NFS protocol only (no SMB, etc.).
Missing features: No hard links, no atomic renames; large rename/move ops can be extremely slow.
Writes are staged—there’s a 60-second write-back delay (resets on additional writes), meaning data is eventually consistent between S3 and the file system.
Eventual consistency warning: If you’re mixing direct S3/API reads/writes with S3 Files mounts, you may see lagged data.
No cross-account bucket mounting: S3 Files cannot currently mount buckets from another AWS account, even within the same Org/Region—problematic for multi-tenant SaaS architectures.
Rename/move operations can lead to significant delays for large prefixes (the user must explicitly accept the risk if exceeding 12 million objects).

Quotes:

“There is a delay that comes from this optimization...you might not see that the new data as soon as you might expect.” — Luciano [21:32]
“If you have another write operation in that 60 second period, the 60 seconds will restart. So it can take multiple minutes before the file is actually right.” — Eoin [22:17]

Real-world Story:

Use case: SaaS app with a control plane bucket needing to be shared with tenant accounts. S3 Files couldn’t be used cross-account, so separate bucket sync logic had to be built.
For workloads with lots of renames or append-only logs, delays or operational quirks must be considered.

8. Cost Considerations [27:52]

S3 storage/ops: Charged as usual, based on size, ops, and class (including cheaper infrequent-access and Glacier tiers).
Cache/storage: ~$0.30 per GB per month for cache (regional variation applies, e.g., $0.57 in Sao Paulo).
Cache reads: ~$0.03 per GB.
File writes: ~$0.06–0.07 per GB.
These prices align with EFS Elastic Throughput.
Economic Sweet Spot: When most data is cold (not in cache), or files are large/infrequently accessed—less cache means lower cost, possibly cheaper than EFS for many workloads.

Quote:

“Files that aren't in your cache are read from S3 with no additional cost...there is a premium compared to just using S3. In exchange for that, you get fast read and write performance, especially on cached small files.” — Eoin [28:07]

9. Summary and Final Take [29:50]

S3 Files is a practical compromise, bridging classic file system behavior with S3’s economics, but it’s not a magic replacement for a true file system.
Best used where applications expect file mounts, but your data is mostly cold or read-heavy, and you benefit from S3’s pricing at scale.
Fully understand staged writes and potential for eventual consistency before adopting, especially in mixed-access or multi-account scenarios.
Users should benchmark their own workloads to determine suitability, especially paying attention to file sizes and access patterns.

Quote:

“Our benchmark suggested S3 files can be quite interesting for workloads where you are doing lots of small file reads...if you're only reading small files and they all end up in the cache, maybe the cost is going to be quite high and as comparable as just having the files directly in efs.” — Luciano [30:50]

Closing invitation:

“We'd be really good to hear. Have you used it, what did you find?...We are always learning from our listeners and we're always eager to hear your stories.” — Luciano [31:54]

Notable Quotes & Moments

On S3's architectural mismatch:

“[S3] doesn’t have the concept of directories. What looks like a folder is basically just a key prefix.” — Luciano [01:46]
On real-world limitations:

“If you do provisioned iops, it can be very expensive...the best performance always came from having 10 gigabytes of RAM [in Lambda/Fargate].” — Eoin [15:14]
On surprise results:

“Large file reads are generally much, much faster with S3 files compared to EFS options...possibly from smart parallel fetching.” — Eoin [16:51]
On cross-account headaches:

“Unfortunately, because the cross account mount doesn't work, what we needed to do in the end is to figure out a synchronization mechanism where we can selectively replicate data...” — Luciano [25:14]
On performance bottlenecks:

“If you have millions of objects and just try to rename the top folder...that could take hours.” — Luciano [26:45]

Key Takeaways

S3 Files fills an obvious gap for AWS users—offering file system semantics on top of S3 for traditional apps—via an EFS-backed cache layer.
Setup is reminiscent of EFS file systems: mount targets, access points, VPC configuration.
Excellent for mixed-size, read-heavy workloads, especially where objects are accessed irregularly.
Not suitable for everything—limitations with atomicity, write delays, cross-account sharing, and certain high-churn scenarios.
Cost model is promising compared to pure EFS for many use cases but could escalate if your cache is always full and hot.

For full technical deep-dive, config snippets, and benchmark results, check the episode’s description for GitHub links.

Loading summary

Transcript20 lines

[00:00]
A
S3 has got to be the most commonly used and most loved AWS service. It's simple to get started with, largely cost effective compared to alternatives, and scales massively. But it's not a file system. It follows a key value object store model, and this makes it a bit of a misfit in cases when you want to use it like a standard folder using regular file operations. Normal file systems are usually required for things like databases, applications that write append log files, web applications or CMS apps that assume a mounted folder for their data. And even though S3 is well supported for things like big data and batch processing workloads, it can actually become a performance bottleneck if you've got lots of tiny files. Now, AWS has just released S3 files. This is a new managed file system backed by S3. S3 files tries to be the sweet spot, giving you a proper file system with S3 underneath. The promise of it is that you'll get the scalability, durability and cost benefits of S3 with performance and behavior of a file system. And one of the big benefits is that it integrates easily into EC2, ECS, and even Lambda. Unlike some of the previous options, we're going to dive very deep into S3 files, talk you through how to use it, where it fits, and how it compares against all your file storage options. You'll hear us share our experience since we have also been using S3 files in real world projects, and we also did some benchmarking. I'm Owen, I'm here with Luciano. Welcome To Aws Bytes Episode 154. Luciano, maybe you can start off by telling everyone why is S3 not a file system?
[01:46]
B
In brief, yeah, the first reason is because it doesn't have the concept of directories. What looks like a folder is basically just a key prefix. You can use the familiar slash to make it look like there are directories, but it's just naming conventions. And in that sense it doesn't have true directories. You cannot do things like atomic moves or renames. For instance, if you just want to rename a file or move it from one place to another, which basically means change the prefix entirely, you have to effectively even if you use the AWS S3 MV command, you are effectively copying the object from one place to another. And if you ever done that with a very large file, a very large object on S3, you might have noticed that it takes some time, while if you do that in a real file system, it's pretty much instantaneous because it's just literally renaming the file itself and not moving data or copying data. So that's another big difference, which is interesting. The other thing is that objects are immutable. You cannot modify a range of bytes inside an object like you could do in a file system. And in that sense you cannot really append either into an existing object. There is a little bit of an exception which is multipart uploads, but it doesn't really work in the way you might expect from a regular file system. And if you want to deep dive into the details, we have been talking about that in episode 124. So go and check that out if you're curious. Another thing is that listing of either the entire bucket or a prefix can be expensive, or at least more expensive than a regular file system. And this is because there is no directory metadata in the same way that it would exist in a file system.
[03:38]
A
So.
[03:39]
B
So when you're running a list object operation, it's effectively querying over the keys in the packet. Access control is something that exists in strip buckets, but again, different from what you would have in a POSI file system, for example, because you can generally determine access control using iam policies or ACLs, and not by setting up, I don't know, users and groups as you would do it in a POSIX file system. And performance is also a little bit different because it's defined by how you structure your partitions by using prefixes. So if you put lots of objects in the same prefix, you can have effectively you can damage your performance. So it's typical to distribute your data over different partitions by using a bunch of different sub prefixes, which makes querying S3 faster. So specific tricks you would do with S3 that are not necessarily mapping to things you would do with a regular file system. And of course there are a lot more subtle differences. And the real issue here is that there are applications out there that need file system semantics. So when you Try to use S3 and mimic a file system, sometimes you might bump into things that don't necessarily match the abstraction that the application expects. So that's something to be aware, and we'll see today how S3 Files tries to fill that gap. But I guess before getting into S3 files, which is pretty new, this is a topic that people have been trying to address for a long time. So what are the other solutions that existed for longer than S3 files at least?
[05:23]
A
Well, you have the Fuse user space file system option, which supports lots of file systems, but there's an S3FS fuse library which allows you to do that. A lot of people do that in development, you know, just to be able to explore buckets in a file system. There's also the very popular Python FS spec S3FS library. It's a different S3FS which is used in a lot of big data applications. You also have Hadoop has a S3A, which is like a HDFS abstraction on top of S3. And more recently, the last couple of years, you have mount point for S3 which is Amazon's own file system adapter. We did a whole episode on this, actually episode 95. In fact, this episode covers a lot of the options out there for mounting S3 as a file system, at least before S3 files came along. Now, on top of that, there are actually a whole load of AWS services that provide a bridge layer between S3 and file systems like FSX for Lustre. Now this is a file system for really high performance computing with S3 as a backing data repo. If you're in the HPC space, that's one you'll come across quite frequently. There's Amazon File Cache, this is one that, that never really made it mainstream, I feel. But it's also a high performance option. It's a general caching layer for EC2 only that works on top of S3 or can work on top of NFS. And it's built on top of Lustre, which is one of those high performance file systems. The other one I can think of is Storage Gateway, which is this whole suite of services mostly for connecting on premises storage to AWS. One of those is called S3 File Gateway and it can present S3 as a an NFS or an SMB file share. Now we've covered that all in a previous episode, episode 95. So let's get straight into S3 files. How does it work?
[07:16]
B
Yeah, what we mentioned already is that S3 files makes a normal S3 bucket accessible as a shared file system. So you can still use the S3 bucket as normal, but you can also mount it as a file system. Any change you make in the file system is eventually reflected in the S3 packet. And you can access S3 files from a bunch of different services in AWS like EC2, ECS, EKS, and even Lambda using NFS. So data isn't just stored in the packet itself, but you are effectively seeing it in those compute layers as if it was a file system normally available in those compute instances. The interesting thing is that the way the connection with S3 is managed by S3 files, because files can be streamed from S3 to NFS mount, but in some cases the files can also be cached, and this is by default the case for smaller files. So the idea is that the first time you are accessing that file through the file system mount, the file is going to be streamed, but also cached in an intermediate layer, which is going to give you increased throughput and lower latency. So effectively the next time you're going to try to access that same file, the read is going to be much faster and you will have a much higher throughput. This is something you need to be aware because it's one of those things you can configure for performance. So it really depends on your use cases, the size of the files you are managing, what do you want to be in the cache versus what do you want to always be streamed directly from S3. But by default, files smaller than 128 kilobytes are cached. And yeah, if you need to change that, you can do it. And you should benchmark your use cases to see if that actually improves performance or even changes your cost trade off. We'll talk more about costs in a second. Now, EFS is used under the hood to provide the caching layer. So you can like, your mental model could be like it's going from S3 to EFS and then from EFS to the compute level if you are using this caching mechanism. Otherwise it's just streamed directly from S3. And this is probably why you can see that this storage is available for lambda as well. Because as you know, EFS is also something you can use with lambda. So in general, everywhere where you can use EFS, it's very easy to see why they made S3 files available. Now, with all of that being said, how do we get started?
[09:49]
A
Yeah, the ingredients list for this is not too long actually. So you need an S3 bucket, if that wasn't already obvious. And then your next thing you'll create is the S3 files file system. This is a resource that's linked to your bucket and a file system IAM role. And you can either link it with a specific prefix in your bucket rather than the whole bucket. And then within this file system resource you define these a couple of rules. So you've got the expiration rules which say how long data hangs around in the cache. I think it's 24 hours by default, but you can set it up to, oh, I can't remember but it's a lot longer than that. So you can. You can have your expiration rules and then you have the import rules. The import rules allow you to say what the maximum file size is for cached Data. So that's 128 kilobytes by default, but you can set it to higher than that if you like, and you say whether the data is imported automatically when a directory is first accessed, or whether the file is first accessed. So that could be quite useful in that if you access one file, S3 files can go ahead and import everything else in the directory within the size threshold, if you like. And these rules can be. These import rules can be different for different folders. It's worth mentioning that the IAM policy or the file system policy we mentioned, it needs to trust elasticfilesystem.amazonaws.com so that'll give you a hint about how tightly integrated it is with EFS. But it also needs EventBridge and S3 access. So S3 files uses EventBridge under the hood for S3 notifications. The next thing you need is a mount target, and this is the network link between your S3 files and the subnets in your VPC. You need one of these for each Availability zone you want to support. So you'll provide for each of these mount targets a subnet security groups, and then say whether you want to support IPv4 or V6. And last thing, you can also create an access point. If you don't do this, you just mount the file system itself. But access points allow you to provide specific directories as the root and provide specific posix, user uid, and group id. So mount targets and access points look exactly like the same things you have in efs. They have the same name. The file system resource is quite similar too. But instead of configuring the EFS performance and throughput options, which you don't have here, you just link to your bucket and set the import and expiration rules. As I mentioned, since this is efs, it's a service that you have to access over the VPC. There's no public endpoint like you have with normal S3 access. For Lambda users, that means your function should be set up with security groups and subnets. And when it comes to mounting your file system, it's just like efs. In Lambda, you provide the file system or the access point arn and the mount path. And for efs, you'll create a volume container volume from the file system or the access point, and those volumes are Then mount points in your ECS task definitions. Because it uses S3. We talked about how it'll stream directly from S3. For large files, you need to have either Internet access from your subnet or VPC endpoints to S3. And your security groups, because it uses NFS, will need the NFS port outbound. That's 2049, I think. All in all, if you're used to AWS, it's a relatively simple one. It's not the most complex AWS service setup we've seen. And it's familiar.
[13:23]
B
Right.
[13:24]
A
It follows EFS very closely. There are some more things you can configure, like file system policies, resource policies to get tighter access control. So I think that's probably all we can say about what S3 files is, how to set it up. So let's talk about how it behaves. Maybe starting first with performance, because this is pretty important.
[13:43]
B
Yeah, I'm going to read straight from the AWS documentation just to be sure. I give you the right numbers. And basically each file system supports up to 5 gigabytes per second of write throughput performance. And they say multiple terabytes per second of aggregated read throughput, up to 250k of read IOPS performance and 50k of write IOPS performance. The maximum per client read throughput is 3 gigabytes per second. And when accessing files that aren't cached in your file system, the file system needs to first retrieve the data from the S3 bucket, which has latencies in the tens of milliseconds. Data stored in the file system is read with a low sub millisecond latencies. Writes are staged on the file system with single digit millisecond latencies. So that's all that AWS has to say about performance. My personal experience is that it might be a little bit tricky to put all these figures together, depending the way you use S3 files, depending the shape of your data in S3, like you have big files, large files, how much do you access? All of them. How are you setting up the caching layer? So in reality, these numbers, I will treat them just as rough guidance, just to have a high level understanding. But I would recommend people to do your own benchmarks and see does it really work for you, your use cases. So that I think moves us into our experience. Owen, do you want to share anything about that?
[15:14]
A
Yeah. The nice thing I think there is that compared to just normal efs, EFS can be a bit complex because it's got these different modes, the Latest one is elastic throughput. And then you've got iops, right? You've got provisioned iops versus burst. The old burst method can be difficult to get a handle, and if you do provision iops, it can be very expensive. So what we wanted to do was just do a bit of benchmarking so we'd have a better idea of when to use this in our projects as well, and when the sweet spot was, but also to share with everyone. So we wrote a fairly simple benchmarking application. It's not totally scientific because there's always other influencing factors here. We tried to make it as simple as possible. And the approach was we wanted to do reads and writes of small and large files to S3. So not that large, but just above the threshold where they would be cached and just measure the performance. And we also wanted to compare it against all the different EFS configurations you can have and then run it on Lambda and Fargate. But also not just Lambda and Fargate. We wanted to do different CPU and memory configurations. So we chose a few different configuration sizes between 256 MB of memory and 10 gigs of memory in both Lambda and Fargate. Now the repo and results summary will be up on GitHub if you want to see it for yourself, and the link will be in the description. The SAM cloudformation template we created might also be useful. Just if you're figuring out how do you write one of these things and get it running yourself with S3 files. I think we both had a few cases of trial and error trying to get the first one working. So we have something that now works. So what are our overall findings? Well, small file writes are generally slightly faster than S3. Put object by about 10 to 20%. That's what we can see. The reads are generally dramatically faster than S3 directly, which you would expect, you know, for cached data like 5x to 10x faster. The large file reads are generally much, much faster with S3 files compared to EFS options. And that's because we know it's going through to streaming directly from S3. It's still a bit of a surprise to us. We would have thought that those large file accesses through the different EFS options, especially with provisioned iops, would be faster. But this is what our results are showing, so it's interesting. Maybe there's some smart like parallel fetching going on there in the background. In terms of large file writes, there's very little difference. It just. It's consistent across the board. Otherwise I would say the performance of S3 files is quite similar to EFS with a tiny extra bit of latency for S3 files. So unless the only case where S3 files is significantly faster is in the read operations. Now, since we tested with various memory and CPU configurations, it's worth calling out a general observation which isn't specific to S3 files. If you have Fargate or Lambda, with 512 megabytes or less, the network bandwidth really hurts. And I O can be 10 times worse than just with 2 gigabytes of RAM. The best performance always came from having 10 gigabytes of RAM. For Lambda, we know that memory allocation and network allocation is directly proportionate. Sorry for Lambda, we know that memory allocation is proportionally tied to CPU and network and it seems like there's a similar correlation for Fargate 2. So if you're doing anything that's I O sensitive, having a memory allocation of 2 gigabytes or more if possible, will really help you. The results just start to fall off a cliff if you try to be really frugal when it comes to allocating those resources. There's lots of benefits here and you know your mileage will vary depending on your workload. But in general I am saying this is not one of those services where I was saying it has a very narrow niche. I think it has a broad set of use cases. If you've got a lot of hot data where everything's hitting the cache, maybe it's not going to work very well for you. But in a normal case where most of the data you don't access most of the time, but you can benefit from some speed up, speed up, especially with small files and you like the file system model then I think it's a, it's a real winner right after that. What's the downside?
[19:48]
B
Yeah, there are quite a few and it's well worth being aware of them before just jumping straight into using S3 files for everything. So the first one we already mentioned to some extent is that this supports NFS only. So it doesn't support other file network file share systems like or protocols like SMB. The file system behavior, as we said, it's trying to do the best that it can possibly do to fill the gap between what S3 can do and what a regular file system is supposed to do. So there are of course things that are missing or that cannot easily be replicated. And for example, there is no hard links, there is no atomic renames. And an interesting thing is that if you try for instance, to modify a file from the file system. These changes are staged in the file system and then eventually they are synchronized to S3, which is kind of an interesting optimization, but comes with some potentially unexpected side effects that we'll explain in a second. But the idea is that basically you might be writing into a few different operations files in the local file system. Maybe the common examples you are appending, for example, a few times in a few different, I don't know, seconds you wouldn't want, especially if this is a large file, you wouldn't want that every time you are doing a change, it immediately tries to synchronize back to S3. Because if it's a large file, you are doing a lot of unnecessary write into S3. So effectively what S3 files is doing for you, they are waiting for a certain amount of time to see are you doing any other write into this object before it's actually synchronized back into one atomic operation into S3. So this is the issue here is that you need to be aware that there is a delay that comes from this optimization. So if you are using a mix of access pattern, for example, if you are reading from the file system in one place, also writing it from the file system, but then also reading directly from S3 somewhere else, you might see that there is a little bit of eventual consistency. So you might not see that the new data as soon as you might expect. So just be aware of that. I think the default is 60 seconds. I'm not even sure if it's something you can configure. But yeah, be aware that the consistency isn't immediate. So be aware for eventual consistency.
[22:18]
A
And this 60 second wave, if you got this, is a period where it will wait to see if there are more write operations. So if you do have another write operation in that 60 second period, the 60 seconds will restart. So it can take multiple minutes before the file is actually right. In that case.
[22:34]
B
Yeah, I can imagine that one of the use cases that I don't know if you are kind of streaming logs from somewhere and then piping them to the file system itself. And these logs are basically streaming for the entire duration of the application. You're never going to see that file or those changes at least being reflected into the S3 bucket itself. So just be aware that some of these use cases, you need to really understand the model to see if everything is going to work for you using the synchronization primitives that S3 files provides for you. And another thing that I actually bumped myself in one of the projects we are working with, one of our customers is that if you use S3 files and you have an organization with multiple accounts, and maybe you have a control plan account where you have kind of a bucket with data you want to share with a bunch of other accounts, you would expect that you could use S3 files with a bucket that exists in another account, even if it's in the same organization and region. But that doesn't seem to be the case. I actually couldn't find an explicit mention that this is a real limitation in the documentation. But everything I tried just didn't work. And so my conclusion is that this feature is not supported yet. I'm hoping that it's somewhere in the roadmap in aws, because I think it could be very useful. And just to make you understand why this could be useful, I want to share a little bit of the project we are working on and why we thought this could be a good idea. So we are building a SaaS application where different customers, they will get their own dedicated accounts so they can run some kind of modeling workloads in their own isolated accounts. So effectively the shape of the accounts is that there is a central control plane where we have all the shared resources. For instance, all the models that the SaaS will expose are stored into S3 and they are organized with specific prefixes. And then every time we onboard a new tenant, there is a new account that is created, added to the organization, and a tenant is going to have access only to a subset of these models. So we also need to put in place a mechanism to basically allow each tenant to read only the models that they have access to from this kind of golden bucket that exists into the shared control plane. So without the S3 files with the feature that allows you to mount file systems only on specific parts will be a good match. Also, because these models are kind of immutable, so we upload them once, they're never changed. If they are changed, we upload an entire new version. So we don't even have all the issues we just described in terms of synchronizing the data. We literally just needed to have an efs mount from S3 and S3 file seems to be pretty good with all the caching mechanism, but we don't suffer from all of the eventual consistency. So unfortunately, because the cross account mount doesn't work, what we needed to do in the end is to figure out a synchronization mechanism where we can selectively replicate data from the central bucket in the control plane to A bucket that exists in each tenant account, which is a little bit annoying. It kind of works in the end it just adds a bit of extra complexity that we didn't want to use. But Otherwise so far S3 Files has been working really well. We don't really have benchmark for this specific project where we are going to have a quite varied mix of big and small files depending on the model. So I think maybe later on if this is something interesting, we might do another episode with the details once we start to use it with more and more customers and get some more realistic data. But yeah, so far just annoying. You cannot do cross account mounts. But otherwise S3 Files has been working really well for this project. So there is one more caveat that I think is worth mentioning is that if you rename and move files a lot, that can affect performance. And if you have prefix with like millions of objects and you've just tried to rename that prefix, like just renaming the top folder for example in the file system, that could take hours. So that's just something to be aware. And I think there is also some kind of limitation. Right. Owen on like AWS will warn you that that might happen and you need to accept the warning explicitly.
[27:02]
A
Yeah, it will actually look at how many objects are in your bucket before you create a file system. And if you've got like the documentation says, if you've got something like 12 million objects, that means there's a potential for a four hour rename operation and it'll give you an error unless you accept a warning saying it's okay, I'm willing to accept the risk and avoid it.
[27:26]
B
So this is technically something that might fail at deployment if you accidentally.
[27:31]
A
Unless you add an explicit accept warning property to your configuration.
[27:37]
B
Okay, so hopefully that gives you an idea of some of the trade offs and limitations and missing features so you are a little bit more informed when you Decide to use S3 files for your projects. But I think we need to talk about cost. So what's the story there?
[27:53]
A
Okay, well you've got the S3 pricing. You're always going to pay for that under the hood. Then on top of that, data stored in your cash is priced at around $0.30 per gigabyte or more depending on the region. What's the story with Sao Paulo? It's like 57 cents compared to 30 cents in US East 1. I don't know. Brazilian listeners, please tell us how you feel about this and what you do about it. Reads from the cache are around $0.03 per gigabyte. Again, that starts at that price. That's reads from a cache, right? 3 cents per gigabyte. And file writes are around 6 or more cents per gigabyte, 6 or 7% per gigabyte. Now, if you're familiar with EFS elastic throughput pricing, you might notice that those prices are the exact same. The main difference is that files that aren't in your cache are read from S3 with no additional cost. So that's the way to think about it. There is an additional cost here, but compared to efs, there's a huge saving for reads that don't hit your cache. So in practice this means you can't save a lot of money compared to EFS. But there is a premium compared to just using S3. In exchange for that, you get fast read and write performance, especially on cached small files. If all or most of your data is small and frequently accessed, it can end up all in the cache. So the costs can mount up and maybe it doesn't make sense there. On the other hand, if most data is larger or less frequency frequently accessed, you might have the sweet spot for S3 files. One thing we didn't mention yet is that this will also work with all the different Access tiers in S3, like infrequently accessed glacier and everything. So you can still get those cost savings in your S3 layer. And yeah, otherwise there are those trade offs you talked about, Luciano, particularly the big one, which is this 60 second write back delay.
[29:50]
B
Okay, so let's try to wrap this episode up. I'm going to try to do a quick recap first and then give you our final take. So S3 is amazing. We all use it in all kinds of use cases, but the reality is that it's not a file system. So there are still use cases where applications are expected to have a file system. So you need to bridge the gap somehow. And S3 files is a new way to do that and it is pretty promising. It basically lets you mount the three packed storage into EC2, ECS, EKS, and even Lambda using EFS concepts that you might have seen already if you used EFS as a service. And this is not strange when you realize that under the hood, AWS is using EFS as a caching layer between reading from S3 directly and making that data available as a file system into EC2, ECS, EKS, or whatever. So that's why you can see a lot of EFS things and they might be very familiar if you have used efs. I think that the interesting story there is that you can keep S3 as the source of truth, but give the application a more traditional file system interface when they need it. And you can also leverage this caching layer for improving performance whenever you, for instance, are reading lots of small files. And also if you were using EFS before, there is a potential here to save money because you are not always reading from efs, you are not always keeping all the data into efs, which is generally where most of the cost would come from if you use EFS with lots of big files, for example. So our benchmark suggested S3 files can be quite interesting for workloads where you are doing lots of small file reads. And effectively in those cases, if you would read directly from S3, that might impact performance. But again, if you're only reading small files and they all end up in the cache, maybe the cost is going to be quite high and as comparable as just having the files directly in efs. So just be aware of that. I think it's still having a kind of a good mix of file sizes. It's probably the sweet spot when you need to use S3 files. Now, our take is that with all of that being said, you still need to be aware that this is not magically turning an S3 packet into a fully fledged file system. You should still understand what are the trade offs, what are the limitations, and it is still kind of a middle ground. It is very practical. But just be aware and look carefully what your application is trying to do with the file system and make sure it's not trying to do things that are not supported. And again, the other thing to be aware is be careful about mixed access patterns. So if you are using that bucket for S3 files, but also using that bucket directly, accessing it for read and writes, there might be synchronization issues. So make sure you truly understand that synchronization model. Make sure you understand that 60 second writeback limitation we discussed before. And with all of that into the picture, make sure your design still makes sense. So if you're building a complex architecture, just put everything in the picture before just saying that S3 file doesn't work for you because it might work if you are under the right circumstances. So now, as always, and especially now, because this is such a new service, we have used it only for some experiments and into a project that is still very early. So this is just our early opinions and our early findings on this service. So we'd be really good to hear. Have you used it, what did you find? And have you used something else that maybe you think is going to work better for you? And why? Just let us know. What are your experiences? What is your opinion if you think something is missing? Because we are always learning from our listeners and we're always eager to hear your stories. And that brings us to the end of this episode. But before saying goodbye, we'll have to thank you, our sponsor, fourtheorem, for powering yet another episode of AWS Bytes podcast. And fortorem can help you if you're trying to design reliable, cost effective storage architecture on AWS, especially if you're using S3efs and now S3 files, or if you're building all kinds of serverless workloads with Lambda containers and any other thing like that. So just check out forthereum.com to find out more about what we do and some of our case studies. Thank you very much and we'll see you in the next episode.