Loading summary
A
Not so long ago we did a deep dive into Amazon Aurora and we talked about all the non trivial things that you need to do to set it up. If you missed that episode it's episode 122 so check it out. But today we are going to be talking about something a little bit different because one of the things we said about Aurora Serverless is that it is definitely something that doesn't necessarily reach what we would call the gold standard of serverlessness. That with think when we think about something some data storage like for example DynamoDB. So that changed recently when Amazon challenged themselves and released a new database called Aurora the SQL. This happened at Re invent last year and just a few weeks ago this new shiny database became generally available. We finally tried out and we found where it excels, what are the limits and today we are going to talk through all of this, what we found out and a lot more. We got a fully working code example that we will be sharing during this episode and we'll also talk about how we stress tested it and did some performance and scalability measures. So welcome to another episode of AWS Bytes. My name is Tuciano and as always I'm joined by Owen. So let's get started. AWS Bytes is brought to you by fortiorm. Stay tuned until the end of this episode to find out more about fortiorm. So let's start with a little bit of an introduction. When you're choosing a database, there are always a few questions that you end up asking yourself. The first one might be, for example, scalability. How will this database scale both up and down to handle changing loads? For example, maybe you're building a small project like a startup thing that you want to experiment with. Chances are maybe you are lucky and it goes very successfully. So what happens to the database? Is it something you need to worry about or is it going to scale automatically to handle the success of the new application? The other aspect might be cost. For example, how much is it going to cost? And again this may be related as well with scalability. How does the cost change as you scale? For example, what is the billing model? For example Availability is another question. What happens if your database might not be available all the time? Like what? What kind of mechanism do you need to put in place? Or maybe something you can rely on and you don't have to worry too much without doing any kind of intervention yourself. Complexity is another element. How much stuff do you need to manage, set up, care and maintain over time? SQL vs NoSQL is probably one of the biggest questions that most developers would ask themselves because it changes really the way you build the application, the kind of data model you can store, and then the type of mechanism that you have to deal with the data. So that's always a very important trade off that you need to figure it out depending on the kind of application you are building. The SQL, interesting enough is a SQL database is a distributed SQL database and is designed to make these choices a little bit easier than other alternatives, especially when it comes to scale, availability and complexity. So we'll get on cost a little bit later in this episode. So how does DSQL make scale, availability and complexity access little bit easier than the alternatives we have seen so far in aws. The first thing is that the SQL scales automatically and virtually infinitely, meaning that effectively you don't have to worry. You can start very small and then as your application grows, as your user grow, DSQL should manage automatically all the scaling in terms of storage, compute and querying separately. They also guarantee you five nines of availability. So again, another aspect that you shouldn't be worried about and you can create a multi region cluster and in that case you get five nines. Sorry, five nines of availability. Yes, you can also get a single cluster, in that case you get four nines of availability. But still, I think it's a pretty good number given the fact that you don't have to worry about managing pretty much anything. And what we found out in our experiment is that it is by far the simplest SQL database to set up. It's pretty much as simple as creating a DynamoDB table and you have a cluster up and running. So effectively there is almost no configuration, there is no patching, there is no maintenance. The only setup you might need is to alter multi region support and backups, for example. So I would say it scores very high in our serverless scale. Another interesting thing that might be relevant for you as a developer is that it is postgres compatible. I will put an asterisk there. We'll probably talk a little bit more about why the asterisk throughout this episode, but the idea is that you can use any postgres connector client and it should work for most of the operation. And, and yeah, as I said, there are some limitations and some trade offs that you should be aware you should consider, but most of the things work, which means that generally you can test locally with a local normal postgres container, for example, and then when you want to test remotely you can use dsql. But yes, there are limitations so Owen, what are you going to tell us about that?
B
So right now DSQL is PostGrest version 16 compatible, so it uses the postgres query parser, the the planner, the optimizer, the type system and the SQL dialect and you get asset transactions. A difference with most databases is that it uses optimistic concurrency control and normally you would get a pessimistic locking approach. The optimistic approach lets transactions in a distributed environment like DSQL run without locking delays. Checks for conflicts then only happen at commit time. From a developer's point of view, this means that you should know when transactions may fail and build in retry support. They won't fail as early as you might be used to. If you watch some of the interesting deep dive talks from Re Invent, you'll know that Amazon uses their Time Sync service to ensure that transactions are isolated and we'll link to a video that will go into that detail. Now, authentication is AWS IAM only, so this is one thing you need to be mindful of. You get an admin role for your cluster and then you can create additional database roles with SQL and then you can associate those with IAM role arns. Then you use normal SQL grants to give specific permissions to these roles. So username, password authentication, all that kind of stuff, Kerberos, anything else you might need, you won't get that. Now, in terms of limitations, one of the big ones is that foreign key constraints are not supported. Obviously if you're coming from Dynamodb you don't get them there either, so you might not miss them. But it is one that will likely cause incompatibility issues with existing tools and ORMs. It's definitely possible to design a great application without foreign keys, but we've already experienced issues with ORMS and schema migration tools that rely on them by default, so you can expect to do some work. It's also worth mentioning that there are a few downsides to foreign key constraints, so we Recommend Reading a PlanetScale article on the topic, which will link in the description below. PlanetScale is another SaaS database provider. I think we talked about Neon in the past, which was recently acquired by, I think Databricks. D SQL is a little bit comparable to those options, I think, in terms of the ease of setup. Another limitation is that if you want to create indexes, which you probably will, you can't just do a create index statement like you might be used to. Instead you have to run Create index Async, which creates a specific D SQL type of Asynchronous index. When you do that, you get a job ID that you can monitor, since that index is being built asynchronously in the background. And again, when it comes to things like schema generation tools, schema migration tools, you might have to do some work to make sure that it's using the right syntax.
A
Yeah, this is one of the most annoying things that I found so far, especially because if you are used to use something like an ORM where maybe indices are created automatically through your own migrations, maybe if you're testing locally with a regular postgres, this is one of the places where your code will need to start to diverge between a local environment with a simple postgres and the SQL in production. But I digress. There are other limitations we want to talk about. For instance, you cannot create materialized views and temporary views, so you can only create unmaterialized views. Similarly, temporary tables are not supported in. You can only have one database in a cluster, but within that cluster you can actually have multiple schemas. So it's not that bad. No triggers or store procedures. So that's another limitation that if you have those things in an existing application that you're thinking to migrate, you need to think carefully because that's a feature that you need to live without, at least for the foreseeable future. And you cannot create columns of the type sequence, which is also another common thing that many ORMs will do out of the box. So be careful about that. If you use an ormonde, partitions as well are not supported, which in fairness doesn't make a whole lot of sense because probably the SQL internally manages all of that automatically. And this is probably another big one. Postgres extensions are not supported. So if you are a user of things like PGvector, it's not something you get here. So you'll need to figure out how to live without that. If you want to use the SQL serial sequence types and foreign key limitations can create problems again when you're using an ORM or a migration system. And it's one of these problems where if you really want to use them, because maybe out of the box you get them by a tool that you're using, then you need to figure out how do I do the same or an alternative thing in production when I ship my code to aws. So that's probably the biggest friction point that we found so far while trying to build an application using tools like ORMS and migration tools.
B
Let's get into our open source application, shall we? So we're going to link in the description below to our GitHub repo where we've created an API based application that you could deploy and you can test it out for yourself. And there's quite a lot in it. The API itself isn't that complicated. It's just two, I think entities in there. One which is it allows you to create lists and in those lists you can create items. So there are two resources and you get your usual post put patch, delete and get endpoints. And the technology behind this is mainly Node JS and TypeScript based. So it's a Fastify API backend using Lambda and the Drizzle orm. It's Lambdalith, monolithic Lambda. You might be interested in that because we've talked about the pros and cons of those in the past, but it worked well for us in this case. Made it quite simple. It uses power tools for TypeScript. We've got an API gateway REST API in front of us, we've got got endpoints to generate an open API specification automatically from the types. And we have load testing built into this repo as well. And we'll talk a little bit more about this load testing in a second. The whole thing is deployed then with cdk, so you can deploy the API pretty easily. There's two stacks. One that creates the database cluster was D SQL. I think we mentioned before that it has almost no configuration. I think in fact it has zero configuration. There are no options when you create a cluster. The only thing you can do is add regions. Now for load testing, we did a bit of thinking and exploring what we can use for this. And a lot of these load testing tools can take a lot of setup and that sort of thing. But we wanted to try and get some metrics so that we could give you some real data on how this works, rather than just talking about our subjective perspective. We ended up landing on K6, which is a tool from Grafana that does load testing. And it's a really simple tool to use. You could just write your test script in JavaScript and it will run it in its own special optimized runtime for load testing. So we tested 1000 virtual users over 2 minutes using K6. What this essentially means is that it's a session. It'll create 1,000 concurrent sessions and just run them repeatedly for two minutes. And this is just all from one machine running it from the the laptop. You can also use if you want to pay, you could pay Grafana to distribute this workload around the world for you and create a more realistic global load test. But we didn't go to that expense, we just did run it locally and it was enough for what we needed. So it's going to create lists, create items and lists do delete, put patch operations for each session and then we had a look at the performance. So we've a few ways of measuring performance. We've got CloudWatch metrics and we've got X ray tracing, tracing as well. And there was a couple of new features that we haven't tried in X Ray before that we were able to avail of. So looking at the metrics that you get out of the box with D SQL, it doesn't give you anything like, I think, RDS performance insights, which will give you like more detailed query performance, but it will tell you, you know, how many transactions you have, how much compute it's using in terms of computer milliseconds, how many bytes are being written and read, things like commit latency. It's all very interesting, but nothing particularly outstanding there. We could just see that we were getting tens of thousands of transactions, in fact hundreds of thousands of transactions happening. But what we did was we used power tools tracing from TypeScript and X ray to capture the lambda statistics, but also instrumenting the SQL queries as well. So instrumenting the database driver, the postgres database driver, so that we could see what query performance was like. And because we were using a monolithic lambda, we were taking an approach where we annotated each route with in. So we annotated traces with the route basically so we could see on an individual request level what the performance metrics were like. And one of the new features I mentioned in X Ray is transaction search. And this is essentially a way of getting open telemetry format traces in CloudWatch logs. And then you have like a CloudWatch logs insights type query interface in the AWS console and you can run statistics on all of your traces and spans and durations and you can get all of those for 100% of your traces, not just a sample. So what we saw from that is that the lambda runtime statistics for different routes, they varied quite a lot. But we were getting, I think, you know, average of between 20 milliseconds and 100 milliseconds of latency for the, for the lambda function execution. Now the looking at the percentiles then for P95, we were seeing that some of them, like retrieving all lists were taking like half a second. And if we look at the P99 stats there were some even greater ones, like the maximum ones were more like 700 seconds or 900 seconds. So we wanted to drill into this a little bit more. So looking deeper at the query segments themselves, we can see that most of the queries are double digit milliseconds. So quite most of them are definitely like in the 10 to 20 second range but there are some which are still like 100 milliseconds and if you look at the maximum ones they go into like half a second. So there is quite a lot of variance here. And we ran it multiple times and we ran it over multiple days, but we still saw this level of variation and it's interesting because sometimes Amazon in their D SQL marketing talk about single digit query performance similar to DynamoDB. Now you do see those in terms of the minimum values, but they're still like if you look at the P90 you're looking at three digit milliseconds for query performance. So it's generally good I would say. But if you're looking at really critical low latency workloads, that's something you might want to consider. Of course when you're talking about databases you also want to look at the connection time and this is a critical thing, especially with lambda, where you don't have things like RDS proxy, which we talked about in that Aurora episode 1, 2, 2. So we wanted to look at cold starts and try and measure how of that was being consumed by database connection time. And our average cold start time is about 537 milliseconds which I think is not unusual for Node JS runtime but probably on the higher end. But we also want to see like, okay, how much of that is initializing Fastify and all our other third party dependencies. So we looked at the, we added actually instrumentation and specific CloudWatch metrics for the database initialization time and that's the time to initialize your database driver, establish a connection. We run a test query which is really like tiny and very short in duration. It's just like select one and we saw that the connection time was between kind of 200 and 300 milliseconds for initializing your database. So that's definitely a significant part of that cold start time. It's the Lion's share, maybe 50 to 70% of would be nice if there was a to have faster connections but it's still, I would say altogether not too bad. And for the majority of SaaS applications, given that you're still getting quite a lot of single millisecond query time and double digit query time. I think it's pretty good, but could do better.
A
We should try that in Rust to see if the connection is established faster. But yeah, it's not something we have experimented actually.
B
Yeah, there is an interesting article by Benjamin Pyle who did a whole lot of research, actually has two articles. We'll link them in the show notes, but one of them is about using Rust to query D SQL and then the subsequent one is about trying to improve the query performance further by using Momento as a cache. Those are definitely worth reading.
A
Awesome. Should we talk a set about high availability of multi regions?
B
Let's do it. Let's get serious.
A
Okay, so yeah, that's definitely another topic that might be important to you, especially when you care about how the data is replicated and trying to make sure you're not going to lose any data. If you're running in production, you have critical applications running on the SQL. So as we said, you have two options, either a multi region and a single region cluster. So if you decide to use a single region cluster automatically, it uses three availability zones and every transaction is automatically committed to all of those three availability zones. If you use a multi regionD SQL, it gives you more availability and each region is effectively linked to a cluster and you get read and write endpoints with strong consistency. Every time you do write new data, that data is synchronously replicated across regions and effectively you get a 0rpo recovery point objective, which basically means that if a region fails at any point you are not losing any data. So to set up multi region there are a few steps that you need to follow. You have basically you need to have two or more participated regions and a witness region, which is basically a region that does not have a read or write endpoint, but is there just to because it maintains an encrypted copy of the transaction log and ensures the five nines of availability. So effective is just used to provide availability and support recovery. Now note that right now only US regions are supported as witness regions. This is something that you need to be aware. Hopefully it's going to be improved over time. But effectively if you are trying to achieve the highest level of availability and at the same time you are worried about data sovereignty, maybe there is a little bit of a conflict there where you might not be able to use the SQL to the fullest capacity. I think now is the time where we get to talk about pricing the time everyone is waiting for.
B
Yeah, and this is where we try to make it clear how expensive or cheap it is, but ultimately fail because it depends. But let's give it a go anyway. Similar to DynamoDB, it has these units that are counted as you read and write, with a separate storage cost as well. The units are called distributed processing units or DPUs, and the cost will vary per region. In Ireland it's like $9.50 for 1 million units, but note that this is quite a bit more expensive than $8, which is what you get in the US regions, but I think if you go to Asia Pacific regions it's like $10 plus. So it does vary quite a lot by region and then it's $0.36 per gigabyte of storage. Now there is a difference when you're comparing to DynamoDB, because DynamoDB bills for reads and writes, and it's pretty clear what a DynamoDB write capacity unit and read capacity unit means in terms of the amount of data being read and written. You can do a calculation on it with dsql. There's a lot more to it, but it's also a bit more vague. DSQL's DPUs are based on reads, writes, but also compute, and you can't really predict how much each query is going to use. I suppose it kind of makes sense that they've got this additional compute metric that they count into the billing because SQL allows you to run complex queries where DynamoDB doesn't, and that means they're going to compute, they're going to consume more than just I O overhead, and it's not going to be deterministic. So there's no way that we can know of to calculate DPUs for your queries in advance, you just have to try and measure. Thankfully there are usage metrics in CloudWatch for all of the things that are factors in your cost calculation, and Mark Bose has provided a script that you can use to get a cost estimate based on this usage, and he also has a good article on DC code pricing. The link for that is below too, and we actually found that when reading another useful article by Alessandro Volpicella, which is called the Amazon D SQL Pricing Guide, so check those out for sure. You do have a free tier with 100,000 DPUs and a gigabyte of storage per month. Inter region transfer will incur cost and data egress to the Internet as always incurs a cost beyond the usual 100 gigabyte free tier. You should also include the cost of keeping backups and you can by the way, integrate DSQL with AWS backup. So overall it's difficult to say how cheap or expensive D SQL can be compared to RDS or even Aurora Serverless. It just depends. It can be very cheap and a lot of the articles I've seen give examples that seem very cheap. But we can also see how with significant load it can get quite expensive for high load applications. Like if we talk about our load test for example, we said we measured 1000 virtual users hammering the system constantly for two minutes and already like having run that maybe half a dozen times this month using a script that was provided by Mark Bose in that article it'll be linked. There's a really nice script that'll just check your CloudWatch metrics and spit out a dollar amount. It will be an estimate, but it's already saying that for us, having run that workload for let's say maybe altogether somewhere in the 10 to 20 minute range, it's like $3. So that's just for minutes of usage for thousands of users. So you would if you have constant high load, you might want to think about investing in the pain of infrastructure management and using something like Aurora or rds or something else. But still, I would say definitely give it a try and measure it because the amount of effort it will take away, the amount of development time and cost it will save you will be pretty massive. So when should we use it? Let's give our pros and cons. Who is it good for? Who should steer clear of it?
A
Yes, I'm going to try my best to summarize that. Although it's always a difficult choice, it isn't very like binary. So let's see, what can we bring into the mix to help you figure out if it's good for your project or not. So I would say that if you're building a new greenfield application where you don't have to spend too much time effectively taking code that you have already written, maybe assuming a generic postgres, and trying to move it out where you might find unsupported features. In that case, it's probably a good starting point, like you have less things to worry about. So definitely if you're doing something new, it might be worth considering it. Another thing is when you have a desire to minimize infrastructure maintenance, so maybe you are doing a migration, but the cost that you would have by trying to figure out exactly things that you need to change, change them so that D SQL is fully supported maybe eventually becomes worth it because then you have less maintenance cost. So that could be kind of a trade off worth exploring as well. Another thing is that you have estimated cost for expected workload and know the the cost trade off makes sense. So effectively, if you have done maybe a few experiments with DSQL and you you have a fairly understand, a fairly good understanding of the usage patterns, they're not going to change too much. So you kind of guess, could guess quite exactly how much it's going to cost you and you see it's actually going to be much cheaper than alternatives. That could be another great use case that might justify a migration. And if you're not using libraries or tools that are requiring some of the unsupported features in the SQL or maybe if you're not using any of those postgres extensions and some of them are actually quite common, so it might actually be possible that that becomes one of the blockers along the road. So consider that as well as another thing that effectively if you are falling to all of these buckets, it might be worth exploring it and deciding then whether you want to stick with it or not. Now if we want to summarize things that maybe can easily get on your path and maybe at that point you should definitely avoid the sequel. Let me try to give you a few pointers. So one reason might be that you have an existing application that will require too much work and at that point it might not be worth anymore to invest that time compared to the benefits that you might get by migrating to it. Another thing might be if you need to guarantee foreign key constraints, those are not supported, so nothing you can do about that. Very similarly, if you need traditional indexes, that's a problem. There are indices you can use, but you'll need to change your code to be able to use those. So consider the cost into implementing those changes as well. If you need store procedures, extensions and things like that, then again those are not supported, so nothing you can do about that. If you have predictable load, maybe it's not going to be the most cost effective solution for you. And if you have already invested in provision capacity, so you have invested upfront, maybe switching is not going to give you the saving benefits that you get with provision capacity. Another thing is that if you need to use different authentication mechanisms, like Kerberos for instance, we mentioned only IAM is supported, so that's another blocker for you. So with this we get to the end of this episode. As we promised you, we'll have to spend a few words about Forthereum. Thank you for sponsoring another episode of AWS Bytes. Fortherm is the consulting company we work for and at for theorem. We believe that cloud should be simple, scalable and cost effective, so we help teams around the world to just do that. So whether you're diving into containers, stepping into event driven architecture, or scaling a global SaaS platform on AWS, or even trying to keep cloud spend under control, our team is available as your back. So definitely reach out. Visit fortiorm.com and see how we can help you to be successful with aws. We'll link all of the resources we mentioned in the show. Note so that's everything we have for today. I hope you enjoyed this episode. I hope you are curious to try the sequel and let us know if you like it or not. What kind of solutions are you gonna build with it? Thank you and we'll see you in the next episode.
Title: We Tried Amazon DSQL So You Don’t Have To (But You Might Want To)
Hosts: Luciano Mammino (“A”) & Eoin Shanaghy (“B”)
Date: June 26, 2025
In this episode, Luciano and Eoin share their hands-on experience with Amazon Aurora DSQL (Distributed SQL), a recently released database aiming to combine the ease and scalability of serverless data storage with Postgres compatibility. They discuss setup simplicity, feature limitations, real-world performance metrics, and pricing, finishing off with guidance on when DSQL could (or shouldn’t) be a fit for your applications. Listeners gain candid insights into DSQL’s strengths, shortcomings, and practical advice for prospective adopters.
“We finally tried out and we found where it excels, what are the limits and today we are going to talk through all of this…” – Luciano [01:05]
Factors in Choosing a Database:
Scalability (up and down)
Cost and billing model
Availability and built-in redundancy
Operational complexity and maintenance
SQL vs NoSQL trade-offs
DSQL aims to make scalability, availability, and simplicity “easier than other alternatives”.
Postgres compatibility is touted, but “with an asterisk” – not all features and extensions are supported.
Automatic Scaling & Availability:
"By far the simplest SQL database to set up. It's pretty much as simple as creating a DynamoDB table…" – Luciano [03:44]
Postgres Compatibility (with Caveats):
Authentication:
“You get an admin role for your cluster and then you can create additional database roles with SQL and then you can associate those with IAM role arns… Username, password authentication, all that kind of stuff, Kerberos, anything else you might need, you won't get that.” – Eoin [06:06]
“It is one that will likely cause incompatibility issues with existing tools and ORMs.” – Eoin [06:52]
CREATE INDEX ASYNC syntax and monitor background jobs; diverges from standard Postgres behavior.“That's probably the biggest friction point that we found so far while trying to build an application using tools like ORMS and migration tools.” – Luciano [08:48]
Performance Results:
“There is quite a lot of variance here... Amazon in their D SQL marketing talk about single digit query performance similar to DynamoDB. Now you do see those in terms of the minimum values, but... at P90 you're looking at three digit milliseconds for query performance.” – Eoin [15:49]
Notable moment:
"It would be nice if there was a to have faster connections but it's still, I would say altogether not too bad. And for the majority of SaaS applications, given that you're still getting quite a lot of single millisecond query time and double digit query time. I think it's pretty good, but could do better." – Eoin [17:10]
“We should try that in Rust to see if the connection is established faster.” – Luciano [17:52]
"If you are trying to achieve the highest level of availability and at the same time you are worried about data sovereignty, maybe there is a little bit of a conflict there…” – Luciano [19:21]
“If you have constant high load, you might want to think about investing in the pain of infrastructure management and using something like Aurora or rds or something else.” – Eoin [23:46]
Helpful Resources Mentioned:
"By far the simplest SQL database to set up. It's pretty much as simple as creating a DynamoDB table..."
– Luciano [03:44]
"It is one that will likely cause incompatibility issues with existing tools and ORMs."
– Eoin [06:52]
"That's probably the biggest friction point... while trying to build an application using tools like ORMS and migration tools."
– Luciano [08:48]
"General Lambda function execution latency: 20–100 ms, but P95 for some endpoints: ~500 ms."
– Eoin [15:49]
"It would be nice if there was a to have faster connections but... for the majority of SaaS applications ... it's pretty good, but could do better."
– Eoin [17:10]
"If you are trying to achieve the highest level of availability and at the same time you are worried about data sovereignty, maybe there is ... a conflict there..."
– Luciano [19:21]
"If you have constant high load, you might want to think about investing in the pain of infrastructure management and using something like Aurora or rds or something else."
– Eoin [23:46]
| Segment | Timestamp | |-----------------------------------------------------|-------------| | Introduction & DSQL context | 00:00–04:59 | | DSQL features & Postgres compatibility | 04:59–07:46 | | Limitations & dev friction | 07:46–09:46 | | Example app & load testing | 09:46–18:21 | | High availability, multi-region | 18:21–20:20 | | Pricing | 20:20–24:20 | | Pros, Cons, When to Use It | 24:20–end |
Further reading and code examples:
All relevant links, GitHub repo, and resource articles are in the episode show notes.
Let the hosts know what you build with DSQL and how it works for you!