A (11:19)
Yeah, let me try to explain how it is built. So again, in a nutshell is Effectively it's called S3Migrate. It tries to do something somewhat similar to AWS S3Sync, but allows you to provide two separate sets of credentials. This is probably the main difference from like a idea perspective of the tool. So you don't necessarily have to have one single set of credentials you can provide two for source and for destination. And the tool itself is written using Node JS specifically in TypeScript, and it uses CMDER JS for the CLI argument parsing SQLite for data storage. We'll get into the details of that in a second because it might sound weird right now. And of course it uses the AWS SDK version 3 for JavaScript to interact with S3 compatible endpoint. By the way, fun fact, if you look at most of these other providers, they all tell you just use the AWS S3SDK to interact with our APIs. So this is actually a good sign that most providers are actually trying to be strictly compatible with those APIs to the point that it's not even worth for them to create their own clients because you can just use the existing SDKs and clients. So that kind of made it a little bit easier for us because we didn't need to learn a new, I don't know, set of libraries or even trying to figure out if we want this tool to work with multiple providers. Do we need to, I don't know, have some kind of abstraction layer where you need to plug in different SDKs? Thankfully, everything seems to work just fine with the AWS SDK for jam. Now you might be asking usual question here, why didn't you use Rust or go? And of course this is something we could debate on for hours and we could do like a flame war of sort. But yeah, if you just want the long story short, I would have personally loved to write it in Rust because I'm a big fan of Rust and I'm always looking for excuses to use Rust more. But honestly, given that we have tons of experience in Node JS typescript and this seems a use case that you have lots of tooling existing that can support you in Node JS and TypeScript it was just much easier and faster to deliver the solution using TypeScript. And the other thing is that from a performance perspective it is true that maybe Rust could have made it a little bit faster and maybe more, I guess from a memory perspective a little bit savvy, like it's not gonna use as much memory. But at the same time the real bottleneck here is networking speed. We are doing a copy like a progressive copy of the data. So really, yeah, networking is the real boss of Ikea. So even if we maybe if we use Rust multithreaded asyncio, the multi threading could have given us a way to parallelize a little bit more the copy. But there are Other strategies that we put in place and we'll talk about that later. So, yeah, this is why we didn't use Go or Rust. But I don't know, maybe it's an exercise for somebody if you want to try to do something similar with one of those languages. As I said, the tool is fully open source, it's published on npm, so you can just use it today. But by using something like mpx, you don't even need to like install it. You can just try it just with one command and see if it works for you. Now, we mentioned that there are two sets of credentials. It works in a similar way to the AWS CLI or the AWS SDK, meaning that you can use the usual environment variables like AWS access key ID or you can use Endpoint and so on. But the difference is that you have, you can use the basic one. If you just use the basic one, that's kind of the default layer. But you can also override by saying Source AWS access key or Source endpoint. And similarly, you can override the destination. For instance, you can say destination AWS access key ID destination endpoint. And the tool also reads from ENV files. So if you prefer to just put all this information in an ENV file because it makes your life easier, the tool is going to load an ENV file automatically, if that exists in the current working directory. Now, the way that it is a little bit different from sync is that there are actually two phases. Like you don't just just run one command and it starts the copy. You actually need to run two different commands. And the first command is called catalog, and that's what we call the catalog phase, which is basically what it's going to do, is going to do a list operation on the source bucket and store all the objects in a local SQLITE database. And the reason why we do this, this is effectively like a mini state file, if you want. And this is what we decided to do to effectively have the kind of resumability feature on one side. So as we copy the files, we know exactly how many files there are to copy. So we can keep track of the progress. We can mark which ones have been copied. And the other thing we can do, because we also store the metadata related to all the objects as we discover them through the list operation. That's also what we can use to effectively do the sorting. So if you want to prioritize the files that are bigger, smaller or newer, you can do that and effectively will be doing. The tool is going to be doing behind the scenes a different SQL query with a different sorting based on your parameters. So that's the reason why we have this kind of intermediate step, just to make it a little bit more flexible, to understand how many objects there are and as you copy, to understand what is the current progress and then to do prioritization of different objects and presumably once you have done the catalog phase, so effectively you end up with this state file, which is effectively a SQLite. You can open it with any SQLite compatible UI or CLI just to see what's inside. And with that you can start the copy phase. So there is another command, s3migrate copy, where you specify the source bucket, the destination bucket and the state file. And of course through the environment you are providing all your credentials. And effectively this command is going to start to look at the state file, figure out what still needs to be copied and start to copy. And of course, being a CLI utility, one of the challenges of course is that you need to have it in some kind of host system or your own personal laptop, like wherever. Like it needs to be a process that runs somewhere. And of course you need to control that process, make sure it's a long running thing. So probably you're going to have some kind of remote machine somewhere, install the tool there, provide all the credentials, create the catalog and then run the command and just monitor that the application is progressing without any issues.