Loading summary
A
Hello and happy 2026. In our latest episode of AWS Bytes, we covered ECS managed instances, which is a new way to power your ECS cluster with managed EC2 capacity. This basically means that you still use EC2 instances under the hood, but in this case AWS takes care of all the usual instances chores like picking up the base image, operative system, lifecycle security, patching, ongoing maintenance and. And you basically focus on describing what you need, for example, if you want a gpu, or maybe if you want specific storage profiles, or maybe particular networking characteristics. And then at that point AWS provisions the right instance for your cluster in exchange for a management fee. Now coming straight out of Re Invent in Las Vegas, which is AWS biggest conference of the year, AWS has taken this very idea Manage Instances and applied it to Drumrong Lambda. So the name of the announcement is Lambda Managed Instances and it has been a little bit controversial. It has sparked a lot of curiosity, plus a fair amount of confusion. Because let's be honest, why in the world would you want to bring EC2 instances, aka servers, into one of the most serverless compute services out there? So you might be wondering what does it really change for Lambda? Do you gain something that you could not do before? Honestly, we were as well curious as much excited by this announcement. So we took Lambda MI for a proper test drive during the holidays. And in this episode we are going to share our take on it. So we want to share what Lambda Managed Instances enables that default Lambda could not do before. And there are a few spoilers I want to give you here just to capture your attention. There are no more cold starts, kinda, we'll talk more about that. And a single Lambda environment can now handle multiple requests concurrently, which is also very interesting. And we'll cover that in detail. We'll also talk about how to set it up and make the most out of it. What are the different ways to scale the underlying EC2 capacity limitations and pitfalls we noticed, and there are quite a few of them to be aware of, the use cases when we think this is actually a good fit and a good idea to use this new technology. And of course we'll talk about pricing and whether it's worth the cost and the effort. Now, since we built a realistic application for this exercise, we'll also talk about our use case and our example application as well. My name is Luciano and I'm joined by Owen and this is AWS Bytes. AWS Bytes is brought to you by Forthereum. Stay tuned to hear more about Forthereum at the end of the show. So, Owen, maybe we should start by clarifying again, what is the difference between defold, I guess we should call them lambdas and. And Lambda on EC2. Yeah.
B
Let's call the old model the default. That's what AWS calls it in the documentation. So let's roll with that. Your default lambda is the fully managed. Just run my code service that we know and love and most people refer to when they say lambda functions. And a function is invoked in response to an event, right? An object gets uploaded to S3 that could be an event that triggers a lambda that resizes an image, extracts metadata. That's a canonical example. So when an event arrives, AWS is running your code in this isolated execution environment. Where does this environment exist in terms of compute? It doesn't matter. Of course it will be in a server somewhere, but we don't get to see that. We don't care. That's what we like. The important detail is that each environment processes only one event or one invocation at a time. If more events come in, Lambda scales out by creating more environments to handle concurrency. And then if traffic scales back, Lambda will manage scaling back the execution environments, often effectively to zero. The main trade off is gold starts then, because all of that scaling that lambda is doing behind the scenes, you don't pay for that, you don't think about it. But it means that sometimes there isn't a warm environment ready to serve your event. And Lambda will need to create a brand new environment, which takes time. Sometimes it could be a few seconds, sometimes it's as low as 100 milliseconds, or even less if you're using Rust, for example. But often the often it can be significant, and we've talked about that in previous episodes when we dealt with Python and large data science packages. Now let's talk about the new way of doing things. The new option, at least lambda managed instances. This one is about the lambda service keeping the same programming model and integrations, but it changes the how and where your code runs element. Now, your function executes in containers on EC2 instances in your account. These EC2 instances are chosen via something called a capacity provider, and we'll talk about more about that in a second. So the AWS is still managing provisioning, patching, routing, load balancing, the scaling mechanics and the lifecycle that's important. So it's not about necessarily taking on all the burden of managing EC2 instances. It is about just changing the scaling and provisioning capacity. And also the cost model for lambda which we'll also talk about. So it's not just Lambda, but on different hardware. There are a few big behavioral changes that will materially affect how your functions perform, how they scale and how you need to write and operate them. So it's definitely something you need to go into with your mind fully informed on how all these things work. So what changes in practice is. Well, for one, the concurrency model. One execution environment can now handle multiple concurrent invocations, unlike default Lambda's single invocation per environment model. This is a big change in how Lambda operates and something that people have been asking about for a long time. And it reminds us a little bit of Vercel's Fluid Compute. It also makes Lambda suitable as a backend for high throughput APIs. We've seen people who are familiar with, say, the Node JS ecosystem and how it can handle high concurrency asynchronous IO in one process. This is one of the great benefits of Node JS when it came out originally. And then when you move to Lambda, you're wondering, well, I'm losing all of that power now because I can only do one event in a process. But now you could do multiple events per process, so you get a little bit more of that benefit back other things that have changed with managed instances now, we now have always on environments, so environments can stay continuously active, no freezing between invocations. And this helps to mitigate most of the traditional cold starts. Scaling behavior now is asynchronous and driven by things like CPU utilization. It doesn't scale to zero because it maintains a configured minimum capacity and fast traffic. Ramps can outplace scaling briefly. So you'll need to think about that. And then when you publish a function to run on managed instances, Lambda by default is going to launch 3 instances, 3 EC2 instances in your account by default for resiliency. It's good practice and it'll bring those environments up before making the version active. Now there are ways which we might touch on a little bit later on, how to avoid there being three instances all the time. But that's just generally a good practice. When we talk about Lifecycle as well the instance Lifecycle, these instances will stay up for a maximum of 14 days. That's what Lambda is saying. And then they're going to rotate them. So that's something you should plan for if it's important. And now with this model, you can pick instance characteristics so you can think about the latest CPUs you want, like Graviton 4. You can get here now configurable memory to CPU ratios. If you want high bandwidth networking, which is something you didn't have that much control over before, now you have that option. So these are all new considerations. What stays the same, it's still lambda in how you build and integrate it. You should treat your code more like a concurrent long lived service provider process because the execution environment can handle parallel invocations and sticks around. So if you're used to functions that didn't hang around sandboxes that didn't hang around these short lived execution environments, you may not have noticed your memory leaks before now. With these 14 day instances, it might be something you'll have to think about.
A
Should we talk now about how scaling works? I think a little bit more in detail. What do you think? Okay, I'll try to cover up for that. So I guess the first mental model shift is that lambda managed instances scales proactively as you explained, and not on demand, which is the case for the default lambda. And with default lambda, let's repeat that just for clarity. When an invocation arrives, lambda looks for a free execution environment. And if there is none available, it creates a new one on demand. And this is where you can see that famous or infamous cold start. With managed instances, lambda does not scale because an invocation arrived. It scales asynchronously, basically upfront watching things like CPU utilization and multi concurrency saturation and and it's basically trying to determine how busy is the environment and do I need to effectively provide more room for executing lambdas. And we'll talk about what that actually means in practice in a bit more detail in a few minutes. But just think about these two differences. In the case of default lambda, you basically don't have to think about anything. You just know that when event arrives, if there are no environments, they will be created on demand. With lambda mi, basically those environments are created basically in the background upfront before your code is executed. And one analogy that I think can somewhat explain this idea a little bit better is we can think about managing a restaurant and you can imagine that if you have, of course you need to have tables in a restaurant and you can imagine that Those are your EC2 instances, then you need to have execution environments and we can compare those to your staff working and serving those tables. And then max concurrency is basically the idea of how many guests can a single server handle at once. And if we use this analogy, which is a little bit of a stretch, but I think it's still useful to get the mental model right, we can think about scaling in this terms. So when demand goes up, what Lambda MI can do is basically scale two different layers. It can either add more execution environments on existing instances. So this is like if you are hiring more staff in the same exact restaurant, the same space available, you just have more stuff available to do more things. And then if your instances are running out, you can add more managed instances. So basically you're adding more EC2 capacity under the hood with your capacity provider. And this is basically like adding more tables to the restaurant, which of course it doesn't. You have to think that as not necessarily cramming more tables in the same space, but probably more taking more space. Maybe you're putting the tables outside, or maybe you are expanding the restaurant somehow, right? Maybe taking the next building or something like that. So these are kind of the two dimensions that are used for increasing your capacity for running more lambda code. And this is why scaling can feel a little bit different with this new way of running lambda code. And also because this scaling is asynchronous, it's basically trying to scale upfront. But sometimes if you have lots of demand happening very quickly, what happens is that you might not have that capacity available when you need it. So you might actually see throttles. So you try to execute lambda code, there is no capacity available, your lambda execution might effectively be throttled. So there are some default mechanisms that AWS put in place where basically if your capacity doesn't double within five minutes, you should be okay. But I think if you go over that, so if your traffic doubles very quickly, then maybe it's where you start to see throttles. You need to play around with that to see exactly how it works. But this is what we can infer from the documentation. So again, just to remark, the main change is that instead of thinking about the first request is going to be slower, which is the case of a cold start. The failure mode is more like maybe you have a sudden spike and you might see throttles. But in general, if you have predictable traffic and your capacity is enough, you are not going to see cold starts or throttle. So that gives you a little bit of a more predictable and always available environment to run on, which is nice, especially if you're using languages that tend to have a quite long cold start, like Java, Python or Node JS sometimes, or maybe if you have lots of dependencies that might take a long time to just keep the environment up for the first time, this can be actually a nice use case where you effectively can eliminate that problem. Now, if we want to deep dive Even a little bit more. I think there are a few moving pieces that we also need to mention, and these are the router and scalar and a lambda agent. This is more to explain how and AWS implemented all these things under the hood. So in practice, when you publish a new version with a capacity provider, Lambda launches a managed instances in your account or multiple instances. Of course, if we consider that they will be available in different availability zones and as we said, by default there are three instances for resiliency in different ages and this version will eventually become active. So effectively you publish a new lambda. The instances are created in different availability zones. That lambda is now considered active. When an invocation comes in, your environment is going to start to consume CPU and memory. And that's where you have a lambda agent which is running within the EC2 instances and reporting this consumption to what AWS calls the scaler. And the scalar is effectively the component that decides do we need to add more environments or more instances? So it's this continuous conversation between all these different moving parts to try to determine are we using enough capacity, do we need more? Is there still space to run more lambda functions? The router instead is what is responsible for whenever an invocation comes in deciding which lambda in which EC2 instance effectively which environment of the lambda is responsible for handling that particular invocation. Again, when the traffic goes down, that's another thing we need to consider. The agent reports that too and the whole system can decide to scale down environments and instances accordingly. So I think that gives you the general idea of how things work more at an abstract level. When we talked about the restaurant analogy, we also spoke a little bit more in the actual implementation details with other different components. But what probably matters the most is what can you control? As a developer building your applications with this new mod in mind, what kind of tweaks and toggles can you touch?
B
Well, at the function level you can pick how big your execution environment is. That's in terms of VCPUs and memory. The smallest supported size is 2 gigabytes and 1 VCPU. The key is to choose a size that supports your intended multi concurrency because each environment is meant to handle multiple invocations. Now the other big thing here is that previously you had a 10 gigabyte memory limit for default lambda mode. Now you can have 32 gigabytes available to a lambda invocation, which is a big change. The rule of thumb, I think if you're doing CPU heavy work with not that much I O, you typically want more VCPUs rather than just cranking concurrency. And then you can specify the maximum concurrency per environment. So default in lambda. That would be one to one ratio. But here you can have up to 64 concurrent invocations per VP CPU. So you can increase it. If each invocation is light on CPU and maybe more I O bound, then you get more throughput per environment and you'll get more cost as well benefit and you can decrease it if you're memory heavy and CPU light, then that's at the function level. And then at the capacity provider level you can specify your target resource utilization. So how much headroom do you want really? So a higher target would be higher utilization, potentially lower cost, but less headroom then a lower target you could use if you want more spare capacity for bursts, but you'll pay for more idle compute. And then you can specify your instance types. So you can constrain instance types. But AWS recommends letting lambda choose for best availability. So don't be too restrictive and specific on what instance types you support. So when it comes to the two scaling modes, manual versus automatic scaling at the capacity provider level, you could specify which one you want auto is the default manual exists when you want precise control over the scaling threshold. Now from what we've seen, that's basically just a CPU scaling threshold. CPU utilization scaling threshold. I haven't seen any ways yet, haven't really tried it either. But of using other, maybe custom metrics or other metrics to scale like you could with an auto scaling group. Then separately at the function level you can specify the minimum and maximum and execution environments. So this is a particularly important one because you need to if you want your function to be invocable, you have to specify a non zero minimum. Then AWS might add more as it sees fit. But if you set it to zero, as we'll discuss later, that's basically turning off your function, making it not invokable. But you can change the scaling characteristics with a put function scaling config API and that'll allow you to have more brute force or manual control over scaling before a batch job. Background processing, if you wanted to do some large scale processing during the night for example, given that, I hope that made some sense. Luciano, do you want to talk through what we built?
A
Yes. We like to do practical test drives. And so we needed to find an excuse and think what can we actually build that maybe makes a little bit of sense when comes into the context of lambda MI and its particular characteristics? So what we thought is again video processing, which seems to come up a lot in our examples, maybe because we think about this podcast and how to optimize the production of the podcast itself. And to be fair, this is not like a full implementation. We didn't really implement like, I don't know, wiring FFmpeg or ML vision models or I don't know, subtitles type of things. It's more the idea that the processing is simulated, but we build everything around it. So in the computation bit you could plug in whatever logic you like. So if you want to use this, what we built for instance, to, I don't know, extract the audio from a video or convert the video or whatever else you want to do that you can do with FFMPEG or something else, you can definitely do it. So the idea is that we built a service with three main components. So there is a REST API that allows you to manage videos. So effective is like a crude API where you can create a video entry, you can list them, you can get the details, and most importantly you can trigger processing. And we'll see how that works in a second. So one detail that might be important to know is that we built a little bit of a lambda lit. So effectively it's one lambda that can respond to all the routes and it's behind an HTTP API gateway. So then we have a simulated video processor. So whenever you call the process lambda, sorry, the process API endpoint that we mentioned before, effectively you want to trigger the processing of a video. So that happens in this other component which is a simulated video processor. That effectively is where you will put all the heavy lifting. Like as we say, different use cases might come to mind. Thumbnail generation, transcoding, content analysis, subtitle generation, I don't know, chapter generation, whatever you think, it makes sense. So in a real system this is generally something that would be CPU intensive work. So, so particularly sensible for this use case where you might spin up a lot of EC2 capacity just to be able to do that at scale. And then you still have the convenience of lambda to do all the, to package your code in a way that gives you a nice developer experience. And then finally we have a step function which may be a little bit unexpected. We'll explain why we did that. But the idea is that with this step function we can orchestrate everything. And the idea is that we don't want to have. So we actually want to have capacity always available for the REST API. But for the processor component we only want to have capacity on demand while still having the convenience of lambda mi. So the idea is that we, I Don't know if this is actually a little bit of a hack. It feels a little bit like that. But the idea is that we keep the capacity for the processor to zero, which means that by default, effectively that lambda is deactivated. But then anytime somebody's calling the process endpoint, we actually use the step function to spin up more capacity. We basically change at runtime the capacity definition to actually go from zero to a different range. That of course you can configure to whatever makes sense to you, but of course one to whatever you like. So at that point we can start to see the instances appearing and the step function monitors to the point that there is enough capacity to start doing the processing and effectively run our code. So it is a little bit of a hack to effectively get that scale to zero, which of course only makes sense if you control the event that triggers, in this case the processing. It wouldn't make too much sense, for example, in a REST API. So this is how we tested the two different scenarios. One where we have some kind of manual control of the capacity, and the other one where we are just letting the service manage all of that. The way that we do that scaling up mechanism or effectively changing controlling the details of the scaling configuration of the processor function is that there is an API called Put Function Scaling Config and that's where you can define the minimum and the maximum capacity. If you set the minimum to zero, you are basically saying this environment is disabled. And when you set it to more like one or whatever you like, then effectively you are creating the EC2 instances, you are letting the capacity provider create the EC2 instances and then that's where your lambda function becomes active and then you can invoke it. Other small details that might be relevant. We use CDK with TypeScript and if you use the latest versions, everything we just mentioned is supported out of the box. So there are no weird acts that we we could see. We use Node JS and Node JS24 on ARM64. We store metadata in DynamoDB. So all pretty standard, I would say. All this code is available on GitHub, so publicly available, you find the link in the show notes, so feel free to check it out and, and let us know if you like it or even feel free to submit a PR if there is something else that you find doesn't work, or maybe you want to change and improve. So based on all of that, what are our impressions or maybe limitations or pitfalls that we discovered?
B
Well, given that we just did the ECS managed Instances review in our last episode. Selecting instances here with Lambda MI is a bit more limited because in ECSMI you have more of a query system where you specify characteristics of your instance requirements and AWS will pick the right instance type for that, which kind of abstracts you from having to think about specific instance types. And if new instance types become available, they'll automatically get included in the potential query results. But with Lambda MI you can only specify a very limited set of parameters initially, maximum VCPU count and the instance type filter, which is basically allowed instance types or excluded instance types. So an inclusion list and an exclusion list and you still have to think about actual instance types. So it's strange they didn't follow the same model. Region availability is limited for now if your workload is multi region or you want to use specific regions. Right now we've got what, U.S. east 1 and U.S. east 2, U.S. west 2, AP Northeast 1 and EU West 1. So just five regions. In terms of runtime, the support is for the latest version only. Some people, especially enterprises, rely on the ability to pin specific versions of managed runtimes. So if you're migrating existing functions, anything on older runtimes will not qualify. You have to use the latest supported one. Anything that's on the deprecated list or previous list is not possible. VPC networking is now mandatory. Of course it is, because you know you have to run EC2 instances. EC2 instances. There's no exceptions. They always need a VPC, so if you put them in a private subnet with no egress, you're going to have an issue. You'd have to have make sure your VPC has reachability through VPC endpoints to AWS services. Your lambda function might need think about S3 DynamoDB SSM, so you'll need either a network gateway, NAT gateway, Internet gateway, or VPC endpoint. This is exactly the same concern when you have a normal lambda function in a VPC, but you just don't have an option to avoid VPCs. Now, deployments with Lambda MI can be noticeably slower. The first publish on a new capacity provider has to launch managed instances and bring up execution environments before the version becomes active. AWS explicitly says that this can take several minutes, and we've encountered examples of around 8 minutes for end to end deployment, and then there's a minimum size jump. You don't have the option of small utility lambda functions with 128 megabytes of RAM anymore. With managed instances, the smallest one is 2 gigabytes with 1 VCPU so it could be a surprise if you're just trying it out and trying to keep things minimal in terms of resources and costs. Also important to know that creating Capacity Provider with manual scaling still spins up baseline capacity. We saw in our cases when we were doing testing, even when we created Capacity Provider without attaching any functions to it, AWS was starting two instances. It's worth knowing if you expect it to be zero until you attach the function. We also had cases where after scaling down, instances remained active for arbitrary lengths of time. It's definitely something to keep an eye on. Anything else to add to that, Luciana?
A
Yeah, I think it's worth remarking that scale to zero is possible, but it is complex, so to speak. Like we have an example in our repo and people can check out the way we achieve it, but I don't think it's like a general purpose way that you can use it for any use case. In our use case it makes sense just because we have a very clear execution path that determines when we need that computer capacity available and we effectively regret it. Almost like a cold start in our example, just if you want to call it like that, using EC2 instance under the hood and by changing the min max execution environment on the fly. But I don't think, yeah, as we said, that's something you can use for instance for an API that wouldn't work right. So you just need to be aware that while it is possible, it is complex and it's not something you can use as a general purpose mechanism. And the main reason is because when you set the mean execution environment to zero effectively, your lambda function becomes deactivated. And, and there is a very clear indication when you go in the web UI on that particular lambda function, you will see a blue banner saying this version is deactivated. To activate it, you need to set the scaling to a value that is non zero. So just be aware of that. If you were expecting automatic scale to zero, that's not necessarily the case. The other thing is that because now you get concurrent execution that comes with a few more headaches from a developer perspective, which I don't think is necessarily a bad thing, it's probably useful, but it's just something you need to be aware and consider in your code because otherwise you might have unexpected bugs or side effects. And this is the same thing that you need to worry about anytime you are building an environment, can even be a container, can be effectively anything where you're running code concurrently, where you might end up with race conditions or issues of that kind. In the case, for instance, of node js, which is what we used, what can happen is, for instance, when you have global state that is shared across execution environments. So imagine you have like a global variable that you put outside your handler and then you reference that variable inside your handler, because you might have multiple handlers at the same multiple instances of that handler running at the same time concurrently. If they are both changing that value, then you might end up with inconsistent state where one handler suddenly is in the data that was changed from the previous handler. Imagine this is like, I don't know, a user session. You might have two users that are effectively overriding each other, and this is something that might lead to very serious bugs. So just be aware of that. Of course there are other ways to avoid this problem. We're not going to go into detail here on how you can solve this particular problem, but just be aware the problem exists and there are tons of best practices you can find online for your specific language so that you don't run into kind of trading or concurrency issues, which you now might have depending on your language of choice, just because you have concurrence. And. And another similar issue is that if you're using the TMP folder, that's also a shared thing between concurrent executions. So again, the same issue might happen in the sense that if you are creating a file from an instance and then another instance is also trying to create that file, they might end up overriding each other. So just be aware and make sure you select file names that don't conflict, maybe using UUIDs or something like that. Logs can also be problematic in that sense because they will interleave and, and I suppose this is the reason why structured JSON logs are enabled by default and you don't get to change that. And AWS says they will include a request ID by default in every JSON line. So that should make it a little bit easier to avoid confusion between logs when you're just looking at the logs, but you can still see interleaved lines. So it's up to you to filter by request id. There are lots more potential pitfalls, and it's nice that AWS has put documentation page that goes pretty much in detail also, not just with the problems, but with detailed solutions. So we'll just give you a link that you can find in the show Notes if you're curious. And they are also organized by programming language, so probably that removes a lot of the noise. Depending on your language of choice, you'll be focused on what really matters for that particular language. Now, I suppose the last topic, and probably one of the most interesting for most people, is how much is this going to cost me?
B
Yeah, a big change here really, because one of the biggest mind shift, one of the biggest mindset shifts with Lambda managed instances is that the billing is no longer your memory, times, duration, default lambda compute model. You're not paying for that dimension at all anymore. But of course you're paying for the underlying EC2 instances which are running in your account for a duration that you don't necessarily control to a fine level. So with managed instances for Lambda, the official pricing model has three dimensions running in parallel, so you still have request charges. Just like with default lambda, you pay 20 cents per million requests. That's simple, familiar, it's independent from how long each request runs. And generally in any bills I've seen, that's a tiny negligible component compared to the other dimensions. Then you have your EC2 instance charge and now you're paying the EC2 instance that backs your capacity providers using standard on demand EC2 provider. The key benefit is that now you can apply EC2 instance savings plans and reserved instances and any other EC2 discount mechanisms that might be applicable. The new thing is just like with ecsmi, now you've got kind of a management fee managed instance tax if you like. With ECS that was 12%, we worked it out. With Lambda, it's 15% premium calculated on the EC2 on demand price of the instance. So the important nuance to this is EC2 discounts applied to the compute portion, not to the management fee. You're always paying 15% of the on demand list price for the management fee. And also critically, spot instances are not yet supported, just like ECS managed instances. So what this all means is that if you've got steady state high volume workloads, you might have massive cost savings with Lambda mi, but you'll have to measure and have a look like you really need a consistent load or to be doing something like we did with the video processing example, where you're scaling it up for a certain amount of time, doing a large volume of batch processing and then scaling it down. But you can also things like multi concurrency and like you said Luciana, with the high volume API you've also potential for cost savings too. So your mileage will vary. But high volume requests, longer running lambdas really can benefit here. And of course remember that you can leverage existing savings plans with EC2 and reserved instances to save even more.
A
Yeah, I guess. Let's jump to the conclusions. I think what's important to mention here is that lambda mi it isn't like a new version of lambda. It is a lambda replacing lambda or lambda V2 or whatever you want to call it. It's just a different execution model. So it's like a new option for executing lambda code. So it's still the same lambda developer experience and integrations. It's just the compute, the underlying compute is something that you get to control before it was just happening magically behind the scenes. That's why we love to call it serverless. Now it's a little bit less serverless, but it's an option and there are benefits and cases where you might want to use this particular option. What I personally I don't know if I like it or not. I think it's a bit sad. That makes my mental model or decision making process or decision tree, if you want to call it like that, a lot more complicated because now there are more options and more dimensions to think about. But at the same time it's also a good thing because there are definitely cases where something like this is useful. So now you have that option without having to leave the comfort of lambda or without having to do a massive refactor in if you already have a solution that runs on lambda. So again, good and bad things you have more options to decide on. But at the same time those options can be very useful in certain particular cases. I still expect those cases are maybe limited. Maybe it's more enterprises with big workloads or maybe cases where you're doing cost optimizations or maybe cases where you need a significant amount of concurrency. But those cases exist. So now be aware that this option exists. So as we said it shines where you have steady state, predictable loads, high throughput APIs, CPU heavy or long running work batch workflows and so on. I think it might not be the best use cases if you are worried about. You might have for example fast spikes and those spikes can cause you throttling. So depending on your use case, the more traditional lambda approach might be better suited for that. You have more knobs to tune. So for instance you have to think about max concurrency utilization targets, instance shape. So definitely something else that's worth considering and adds complexity. And then the minimum function size is also something you need to be aware because if you have I really like for instance to do single purpose lambda functions so sometimes I end up with even hundreds of small lambda functions. I think this approach pushes you a little bit more into the lambda lit land as we did in our particular example. So just be aware there is nothing necessarily wrong with it, it's just something you need to consider and I'd really.
B
Like if Sorry to interrupt, but I would really like if there was an option here where if it was scaling slowly and maybe you're getting some throttling because it's trying to provision a new ECS or sorry, EC2 managed instance for you. If then you could say, well in the meantime just use default lambda for that function and handle the scale by using the normal execution mode. But you can't mix the two with lambda mi. You're either using one or the other, which is a bit of a shame. It would be nice to be able to have a blend.
A
Yeah, maybe we can consider this a feature request if anyone from AWS is listening. But I definitely agree with you. So just to close things off that the bottom line is that this is a new tool and it might be a very good tool for the right workloads. It's not necessarily something we should consider as an upgrade to Lambda, but yet again it's definitely a useful tool and it's worth to know when to use it and how you can use it. I remind you that we have a repository with an example. You can find it in the show notes. So if you have tried it, let us know if you like it. Let us know why if you don't like it. Also let us know why. We are always open to talk to you and hear your opinion and see maybe if you found other use cases that we didn't think about. One last thing, thanks to Forterium for sponsoring yet another episode of AWS Bytes. Fourth Theorem is an AWS partner. It's a consulting company. We can help you with your AWS architecture. We can make sure that your implementations are simple, scalable, cost, sane. So if you're curious, check out ForTheorem.com and find our case studies and get in touch if you want to know more. So thank you very much and we'll see you in the next episode. Bye.
Released January 16, 2026 | Hosts: Eoin Shanaghy & Luciano Mammino
In this episode, Eoin and Luciano dive into Lambda Managed Instances (Lambda MI), a newly announced execution model for AWS Lambda that brings managed EC2-style capacity to one of the hallmark serverless services. The hosts dissect what changes and what stays the same, explore hands-on use cases, compare Lambda MI to default Lambda execution, and highlight where Lambda MI fits into the evolving AWS compute landscape.
[00:00–04:30]
Lambda MI is inspired by ECS Managed Instances, but applied to Lambda: you still run your code on EC2—this time managed and provisioned by AWS, relieving you of AMI selection, OS patching, and ongoing maintenance.
Instead of classic on-demand Lambda scaling and cold starts, Lambda MI pools EC2 resources ahead of time and handles multiple requests concurrently in each environment.
Notable quote:
"Why in the world would you want to bring EC2 instances, aka servers, into one of the most serverless compute services out there?"
— Luciano [01:14]
Spoilers teased:
[02:39–07:51]
Default Lambda: Each invocation runs in its own ephemeral environment; scales horizontally with incoming events; “cold starts” occur when new environments are launched.
Lambda Managed Instances: Your code runs in containerized environments on always-on EC2s within your account; these can serve multiple concurrent invocations.
AWS manages instance provisioning, patching, and scaling, but changes the scaling, provisioning, and cost model compared to standard Lambda.
Longer-lived environments (up to 14 days) mean memory leaks and state management bugs are more visible.
Notable quote:
"One execution environment can now handle multiple concurrent invocations, unlike default Lambda’s single invocation per environment model. This is a big change in how Lambda operates and something that people have been asking about for a long time."
— Eoin [04:44]
[07:51–14:00]
Default Lambda: Scales reactively—new environments created as needed for each event.
Lambda MI: Scales proactively/asynchronously, watching CPU and concurrency. May cause throttling on sudden high-traffic spikes until more capacity is ready.
Analogy: The “restaurant” analogy compares EC2 instances to tables, execution environments to staff, and max concurrency to the number of guests each server can handle.
Two ways to scale:
Scaling can feel less “automatic”—requires advance planning of capacity.
Notable quote:
"If your traffic doubles very quickly... maybe it's where you start to see throttles. But in general, if you have predictable traffic and your capacity is enough, you are not going to see cold starts or throttling—so that gives you a little bit more of a predictable and always available environment."
— Luciano [10:45]
Technical deep dive:
[14:00–17:01]
Function-level specs:
Capacity provider-level settings:
Ability to change execution environment min/max via API for dynamic scaling.
[17:01–22:02]
The hosts developed a simulated video-processing API, motivated by real-world workloads.
Three core components:
Demonstrated two operation types:
All examples written in TypeScript using CDK, Node.js 24 on ARM64, DynamoDB.
Notable quote:
"It is a little bit of a hack to effectively get that scale to zero, which of course only makes sense if you control the event that triggers, in this case the processing."
— Luciano [19:19]
[22:02–29:17]
Instance selection: Fewer “abstraction” options than ECS MI; you must specify allowed/excluded types.
Region support (as of Jan '26): Limited to 5 regions globally.
Runtime support: Latest versions only—no older or deprecated runtime support.
VPC required: All Lambda MI workloads must run in a VPC (networking/egress concerns apply).
Deployment lag: Initial deployments can take minutes, as instances must spin up and warm environments.
Minimal resource size: No tiny Lambdas; smallest allowed is 2GB/1 vCPU.
Manual scaling quirks: Creating a manual capacity provider can still spin up baseline instances, even unused.
Scale-to-zero is complex: Not as seamless as with containers; function deactivation occurs if min=0.
Concurrency headaches: Must account for thread safety, shared global state, filesystem collisions, log interleaving.
AWS Guidance: Extensive docs available, including per-language pitfalls and solutions.
Notable quote:
"Because now you get concurrent execution, that comes with a few more headaches from a developer perspective... if you're using Node.js, for instance, and you have a global variable, you might end up with inconsistent state. This is something that might lead to very serious bugs. So just be aware of that."
— Luciano [26:03]
[29:17–31:46]
Billing now includes THREE components:
Potential for significant cost savings on steady, high-volume, high-concurrency workloads (and you can leverage EC2 discounts), but must be sized and measured carefully.
Multi-concurrency and big, long-running workloads may tilt cost advantage toward Lambda MI.
Notable quote:
"The new thing is, just like with ECS MI, now you’ve got kind of a management fee—a managed instance tax if you like... with Lambda, it’s 15% premium calculated on the EC2 on-demand price of the instance."
— Eoin [30:10]
[31:46–End]
Lambda MI is not Lambda v2 or a replacement: it's another tool in the kit.
Retains the Lambda developer experience; you just gain (and must manage) compute control, capacity settings, and concurrency.
Best for:
Not optimal when you need:
New options ("more choices, more complexity"); careful evaluation needed per workload.
Request for AWS: Would love to see on-the-fly fallback from MI to default Lambda during scaling, but this is not currently possible.
Open-source example code and further documentation are provided in their GitHub repo.
Notable quote:
"This is a new tool and it might be a very good tool for the right workloads. It's not necessarily something we should consider as an upgrade to Lambda..."
— Luciano [34:47]
"Why in the world would you want to bring EC2 instances, aka servers, into one of the most serverless compute services out there?"
— Luciano [01:14]
"One execution environment can now handle multiple concurrent invocations... something people have been asking for a long time."
— Eoin [04:44]
"If your traffic doubles very quickly... maybe it's where you start to see throttles. But in general, if you have predictable traffic and your capacity is enough, you are not going to see cold starts or throttling..."
— Luciano [10:45]
"It is a little bit of a hack to effectively get that scale to zero..."
— Luciano [19:19]
"Because now you get concurrent execution, that comes with a few more headaches... you might end up with inconsistent state. This is something that might lead to very serious bugs."
— Luciano [26:03]
"A management fee—a managed instance tax if you like... with Lambda it’s 15% premium calculated on the EC2 on-demand price of the instance."
— Eoin [30:10]
"This is a new tool... It's not necessarily something we should consider as an upgrade to Lambda."
— Luciano [34:47]
| Feature/Aspect | Default Lambda | Lambda Managed Instances | |---------------------------|-----------------------|---------------------------| | Cold starts | Yes | Mostly eliminated | | Concurrency | 1 per environment | Multi-concurrent/env | | Scaling | On-demand/reactive | Proactive/asynchronous | | Runtime size | 128MB min | 2GB/1vCPU min | | Pricing | Pay-per-use | EC2 price + 15% fee | | Spot/Discounts | N/A | EC2 Savings Plans, but not Spot instances | | Deployment speed | Fast | Slower, minutes possible | | Scale-to-zero | Automatic | Complicated, not simple | | Networking | VPC optional | VPC required | | Developer headaches | Fewer (less state) | More (manage concurrency, state, tmp) |
Sponsor mention: ForTheorem – AWS consulting & architecture specialists.