
Are relational databases cool again? In this episode of the AWS Podcast, host Simon Elisha sits down
Loading summary
Lish
This is episode 704 of the AWS podcast released on January 20th, 2025. Hello everyone. Welcome back to the AWS Podcast. I'm Lish here with you. Great to have you back. And I'm joined all the way from Blighty by John Hart, who is a principal solutions architect here at aws. Welcome to the podcast, Josh.
Josh Hart
Thanks, Simon. Great to be here.
Lish
Good to have you here. Now, just off the show, we were comparing notes because Josh has a four week old child in his house. So we'll be test how well he operates on very little sleep.
Josh Hart
Yep, absolutely.
Lish
So, Josh, you've done many, many things in your career, but the thing we're going to focus on today is really about databases. And not just any old databases, but relational databases and some of the things that's been happening in this space recently. That's kind of interesting. Now, relational databases are at the core of many applications and have been for many, many decades. I myself, my first database was a DB2 mainframe database that only had level locking. So yes, I'm that old and these are technologies that work and they fit in to a lot of solutions. However, at some point when you with a relational database, you tend to bump into scale problems. And that's why, you know, early 2000s we start to really see the explosion of NoSQL databases and the ability to avoid partition challenges and have horizontal scalability is why I love DynamoDB so deeply, because it solves so many of my problems. And that's great. But for a lot of customers, they want transactional semantics, they want the things you get out of a relational database. My goodness, reporting is easy, doing joins is easy, querying is easy. Lots of stuff that's easy. And so it's not like we've forgotten about relational databases. We want to make sure we focus on that. But there are challenges that come with relational databases. And maybe let's step back and Josh, let's walk our way through what are some of the things that people tend to bump into after they developed their application? It's all running and suddenly stuff starts to happen.
Josh Hart
Yeah, absolutely. I think when we talk about relational databases, that tends to be one of the main things that people get caught up on, is how do I scale my relational database as my application grows? Very often, you see, the relational database can become the bottleneck and it can become the sticking point as you start to want to scale your application. I've heard customers before say to me, we'll just throw more tin. You know, it's a short term solution, you know, you can add more compute resource but eventually, you know you're going to hit a limit with that to how much compute resource you can give to a single node before you have to look at something a bit more strategic in terms of how you're going to scale your relational database. And additionally when you're doing that, when you're doing that scaling operation, traditionally relational databases, you're provisioning for peak that compute resource. There's no dynamic scaling that's happening there to enable you to change the amount of compute resources you have based on fluctuations in load. We've had auto scaling with compute for a long time now. People are used to that when they work in the cloud. But databases, specifically relational databases, you have to provision for, for that peak capacity. And that means that you're often over provisioned, you're wasting resources and consuming costs that otherwise could be put to better use.
Lish
And often with a lot of databases, it's not just sizing for peak, it's peak plus. Because often there's depending on who you listen to. Some folks say, well, never let your database server run more than 80% CPU or 80% utilization because weird stuff starts to happen. The in the database world, you don't want weird stuff happening.
Josh Hart
Absolutely.
Lish
And also then you've got the issue of, well, how big do you make it? And when, when the marketing department rocks up and says, hey, we've got a great campaign on the super bowl tomorrow, that's a problem.
Josh Hart
Yeah, absolutely. It needs that prior planning and that coordination, you know, is often a nice to have.
Lish
If we can't keep going big, let's say, in terms of a single node, what's stopping us from having lots of nodes like you mentioned auto scaling before, why don't we just autoscaler database?
Josh Hart
The challenge really with auto scaling your relational database is that if we look at a traditional provisioned database, take for example PostgreSQL, which is one of the most popular relational database engines, open source database engine. In order to change the amount of resource that we have allocated to the database process, we have to restart the database process. And that means that we've got some interruption to service, we've got some downtime or lowered availability, or at the very least a failover once we do that scaling operation. So that means that we can't do that vertical scaling responsively in response to changing workloads, changing patterns, without having that prior planning about when we're going to do that scaling operation during a quiet period or a maintenance window, for example.
Lish
And what about from a horizontal standpoint? Can I just have lots of copies of my database? And they all automatically work, really.
Josh Hart
There are three main challenges when we look at scaling relational databases. So we talked a little bit about the vertical scaling issue there. Eventually you hit the, that kind of brick wall and then you look at horizontal scaling. And when we talk about horizontal scaling for relational databases, very often what we're talking about there is sharding. And sharding is where we have to make the application aware that our database is sharded. So a shard being a separate instance of the database that we have some subset of our data stored on, we have some mapping table stored somewhere. And then we have to have a routing component that says, look at this mapping table, find out where we need to send this query based on which part of the key space that we're querying. So that's complexity that we're adding to the application. It's additional operational overhead that we have to manage.
Lish
And so when we talk about sharding, you touched on that fact that there's complexity suddenly because we've got to make changes at the application level. The application now needs to understand in some way where the data is. And then there's also the great question of, well, I've got my data in these multiple nodes, but I want to do an all up report on all the data that used to be easy on one node, not so easy on multiple nodes.
Josh Hart
Yeah, absolutely. There's that trade off if you have to query across the key space where you need to have either some knowledge of what are all of the shards that I need to iterate to get my data and then you add additional compute time to do it, or you need to be using dedicated reporting data warehouse in order to store that kind of flat view of the uncharted data. But yeah, it's definitely a consideration if you move to that kind of horizontal architecture.
Lish
Sharding brings a different set of skills to the table and we'll talk a bit about how we can solve for that shortly. But let's, let's talk about solutions here. Let's talk about things that we've done because customers have said, hey, this would really help us. So let's first tackle the vertical scaling challenge. Sometimes it's busy, sometimes it's not. I mean development test environments, a great example of that. But most environments represent some degree of variability of load. How do we tackle that in a different way?
Josh Hart
Yeah, I absolutely agree that development pre production environments are a great example of where you see this in extreme where they're very often not used over weekends or evenings. So if you are provisioning a relational database during that time, you're going to have that kind of wasted compute resource. So as I mentioned a moment ago, with vertical scaling databases, typically you have to restart the database process in order to add that additional resource. So what we came up with was something called Aurora serverless V2. And what serverless V2 does is it will automatically scale up and down in response to changing traffic patterns, the amount of compute resource that you have available to the database engine. And it will do that seamlessly without having to restart the database process, without having to kill any client connections. So the application is completely transparent that the scaling operation is happening. So you're able to as your database activity ramps up. So if you think of many applications follow that kind of business day bell curve where they ramp up at the start of the day, they kind of peak during the day and then they begin to drop off in the early evening. So then your database will begin to scale as your users come online. You'll be provisioned for your peak utilization throughout the day and then it will scale back off outside of business hours. So you have only the capacity that you need for that given period of time. And again for those kind of development pre production environments where they have those prolonged periods where there's no or very, very low utilization, then you can really scale your costs down.
Lish
Nothing going on. Yeah, yeah. And how quick does it respond? So if I think about more of a sort of production environment, is it seconds, minutes, hours? Like how long does it take for it to respond to those signals of load is coming?
Josh Hart
Yeah. So because we use a heuristic metric for load, so it's not just one metric that we're looking at. It's not just CPU or memory or disk I O or network bandwidth. Rather it's that composite heuristic across all of those metrics. It means that we can very responsively scale additional capacity. And typically we're looking at seconds there. It will scale in increments, it will scale in steps. So if you go, you know, 0 to 100, you'll have a period where it needs to scale up and you need to plan for that kind of ramp up activity. Don't expect it just to instantly be there at 100%.
Lish
Go from zero to 100.
Josh Hart
Absolutely. And then it will scale down more conservatively. Because what we don't want to do is get into that situation where we're thrashing the buffer pool and we're evicting pages from memory very, very quickly. So rather it will scale down conservatively as the workload begins to drop off, but we'd say typically seconds, you can begin to procure additional capacity.
Lish
So it feels very familiar to those folks who would have used auto scaling in terms of speed up, cool down type, time frames, et cetera.
Josh Hart
Yeah, absolutely.
Lish
The other thing here though is to consider is that there's no change from the application standpoint. So it sees a PostgreSQL database like it just sees the database engine it expects. But is there a limit to how big it can go?
Josh Hart
Yes. Yeah. So eventually you're still going to hit that upper limit on the maximum amount of compute capacity that you can scale to. So we actually have very recently doubled that. You're now able to scale to and we call them ACUs, which are Aurora compute units or Aurora capacity units. You're able to scale to 256Aurora capacity units now. And each capacity unit, each ACU is equivalent to 2 gigabytes of RAM and then a bit of CPU with that as well. So that means with the new 256 upper limit, we can go all the way up to 512 gigabytes of RAM available, which is, you know, fairly decent sized database.
Lish
Half a terabyte of RAM on a database is not to be sniffed at.
Josh Hart
No. Yeah, that'll get you a long way there.
Lish
So you can go big and clearly dev test. I've got to say, no brainer to use this sort of stuff. I don't create standard databases anymore and I'm doing any development work, it's always these. But at some point, you know, you're gonna need a bigger boat. It just gets too big. And as you said earlier on, you know, the traditional approach is throw more tin at it. But at some point even that gets us into trouble. Now we talked about sharding as an option and if you go to your friendly search engine, your favorite one, and ask it to explain Sharding or Gen AI engine these days, you'll get lots of answers and models and libraries and all this sort of stuff. But the fundamental thing is it's a one way door and you know, you're starting to introduce complexity. We've tried to think about this from a slightly different way and make it a bit easier for customers. And so tell us a bit about that and what that is.
Josh Hart
Yeah, absolutely. So I agree, you know, the one way door in terms of this being once you move to a sharded data architecture, it's very Difficult to come back from that, because as soon as you have two shards that have Product ID one on them, then you're not able to reconcile that. You're not able to then move to an unsharded architecture in future. So it's something that you have to be very mindful of, the fact that you're entering into this as something that you're committing to for the long term with all of the additional complexity that we talked about before that that brings with it from an application perspective. So what we're trying to do as AWS is to reduce some of the complexity that comes with introducing a sharded database architecture. And the technology that we're releasing to solve that problem is called Aurora Limitless Database. So it's currently available in preview, so you can go and try it out in preview at the moment. And what Limitless Database does is it adds this concept of a shard group to a provisioned Aurora database. And that shard group is a horizontally scalable group of shards, group of database instances that will automatically partition your data across an expandable set of nodes. So as your data footprint grows based on a shard key that you define, will automatically provision additional capacity to route your queries so that we are able to scale horizontally in response to changing traffic patterns.
Lish
So that's pretty cool because, correct me if I'm wrong, but that means we're taking away the undifferentiated heavy lifting of sharding your database. You don't have to worry about the implementation detail anymore. It's done and done robustly.
Josh Hart
Correct, we love to do that here at aws. It's take away the undifferentiated heavy lifting. And that's exactly what we're doing. We're letting you still have that same sharded architecture from the database perspective, but abstracting away the application complexity of it. And the way that we're doing that is that shard group that I mentioned there, you still have your single Aurora cluster endpoint that you connect to. So with your traditional sharding, if you were to do it yourself, roll your own sharding, you're going to have to have that routing component that you manage, and your application is going to need to talk to that to know, hey, which shard do I send this query to? With Aurora Limitless, we just still expose that single endpoint that your application connects to and then we automatically manage the routing of the query to the correct storage node. So your application doesn't need to be sharding aware. Now your Application still talks to a single endpoint, there's no application changes required. And we can then have that, you know, have our cake and eat it where we have our sharded architecture without having to change the app.
Lish
That's nice. It's very nice. Now we've talked about a couple of things. We've talked about obviously vertically scaling and horizontally scaling. And with any technology, the more you dive into it, the more complicated things are and the more details there are. One thing we haven't talked really about is the concept of connection management and database connection. So maybe talk to us about that little domain, that really interesting glue between the application and the database and what it is and why it matters to us.
Josh Hart
Yeah, and this is really the third big challenge that I see with scaling relational databases is if you've got a good strategy for vertical scaling, you've got a good strategy for horizontal scaling. The other part is how do I manage when my application is scaling and I have an influx of connections from my application to my database, these databases. Relational database technology predates cloud compute where we have these horizontally scaling connection pools that can cause these connection storms very easily and overload the database. Each connection that we have to the database, it consumes some additional resource, some cpu, some memory, even if it's not being used. And as we elastically scale up and down that connection churn, as we're opening and closing new connections can have a real material overhead on the amount of resource that we have available to serve the workload on the database. And it's really a common pain point that I see with a lot of customers is without thinking deliberately about how they manage their connection pooling and how they manage their application scaling from a connection perspective, they can get into some pretty dangerous territory.
Lish
It's kind of a silent performance killer in a way because you can be looking at your, your honkin great database server and going that's looking good. You can look at your fleet of auto scaled application instances and go that's looking good. But everything sucks still. And that's often as you say, that's where you look at and go well it's, it's, you're exhausting your, your connections. There are limits and different databases have different limits, but this, this runs into that problem. So then how do we work around that? Because obviously again a lot of customers have said hey this is a problem, please make it go away.
Josh Hart
It really is. And there's two AWS approaches. So one, which is more the traditional approach is to have a centralized connection pooling technology. So this has been patterned for a long time now. There's open source technologies that you might use in this space. So the AWS solution there is called RDS Proxy, and that's something that you provision between your application and your database that lets you control the amount of connections that you have open to the database at any one time. That's been around for a few years now, but it's still something additional that you have to provision that you have to manage the configuration of. If you've got a serverless application, it's something again, that you have to deploy into a vpc. So it's something that can have a little bit of a.
Lish
It's a thing.
Josh Hart
It's a thing. Yeah, exactly. Yeah, exactly. So very recently we released something called the RDS Data API. And this is really a way to rethink connection management, particularly when we're looking at serverless applications. And what the Data API does is it removes all of the TCP connection logic, all of the database driver logic from your data access layer. So you don't have any of that kind of SQL connection logic happening in the application. Rather, what you do is you just make an SDK call. So using the AWS SDK in the same way as if you were uploading an object to S3 or if you were querying a DynamoDB table. So it moves it to being part of the aws data plane APIs, rather than needing that persistent TCP connection between the application and the database. And what that means is, as your application scales, you don't have to worry about that client connection pooling side. The RDS Data API will manage the connection pooling for you as part of the service, and all you need to do is just send your SQL statement as an API call to the Data API.
Lish
So I'm still kind of like, from a developer standpoint, still doing a similar thing. I'm sending my SQL, doing all that sort of query response, etc. But I'm not configuring ODBC drivers or JDBC drivers and all that sort of plumbing again, kind of abstracted away. So I can just worry about getting.
Josh Hart
The data I need exactly right. Yeah, and the really nice part about that is that because I don't need to have a database driver, I don't need to have any direct connectivity to the database either. So I can have my lambda function deployed without a VPC connection or deployed in a separate VPC without connectivity to the one where my database is deployed, and I'm able to still send my SQL query via the Data API without having to set up that persistent connection and going even further than that. One of the kind of things that people often have to think through when moving to cloud databases is how do I manage my local development environment? And before, you might have had a VPN or a jump box, or you might have had to open ports over the Internet, which you don't want to be doing in order to connect to a provisioned relational database via a traditional TCP connection. But with the Data API, again, because it's just an SDK call, as long as you've got valid IAM credentials that will allow you to to make that API call, you can connect from your local machine without any network connectivity. So no VPNs, no security groups, anything like that.
Lish
Wow, that's actually a really cool pattern for, as you say that, you know, multiple team development, wanting to have access to the development database, but without all the plumbing nightmare and the firewalling, and particularly if you're working remotely, which a lot of folks are these days. That's a thing too. So we've talked about three key areas here and we've focused, I guess, on the technology part, which is as geeks, the fun part. But how does this actually help business? What does this give to an organization? And why should they think about these things?
Josh Hart
Yeah, so really the kind of key point here is that this is going to reduce the amount of operational overhead. So when we talk about scaling both from all three of those different aspects, there's solutions to those traditional solutions that you can leverage in order to solve those problems. But all of those solutions require a fairly large operational investment, and they'll take time away from your teams in order to implement them, in order to manage them, in order to scale them as your application grows. So what we are trying to do, as you said, is we're going to take some of the undifferentiated heavy lifting there from those scaling challenges and mean that you can just focus on, you know, your secret source, on what your value add is, rather than having to deal with these solved database scaling challenges. So you can focus on the innovation and the thing that really kind of differentiates you without having to be burdened by these.
Lish
And it's interesting, it's a really common refrain when you talk to IT teams and businesses, they often complain that 80% of the work is keeping the lights on, 20% is innovation and cool stuff, and they want to flip that over. And managing a database is important, but not innovative. So there's some ideas here for folks to save some money, to be more performance and to be more efficient in the way they operate. I think there's some. For most people listening, there's gonna be one thing that they could take action on. At least try. At a minimum, I'd say look at your dev test environments. Always ripe for the pickings for saving money and providing a better experience. Because the flip side of a, of having a dev test environment is usually it's not big enough for what I want to do because, you know, it costs money and we're not spending money for folks who are only using it eight hours a day. Whereas something like serverless, Aurora Serverless 2, you can have the size you need within reason to get the job done, which is great.
Josh Hart
Yeah, absolutely. And you know, you're quite right. It can be heterogeneous. You don't have to, you know, implement all of these solutions all the way across your environments because any of these technologies, from the application perspective, it's all transparent, is still all completely wire protocol compatible with whatever you're doing in your production environment. You could just implement these in your development environment to save cost. But of course, we do have customers running at significant scale in production that are using these technologies in order to mean that the database doesn't become that bottleneck.
Lish
Absolutely, absolutely. Josh, thanks so much for coming on the show and telling us all about it.
Josh Hart
My pleasure. Thanks a lot for having me and.
Lish
Thanks everyone for listening. We do love to get your feedback. AWSpartmazon.com is the place to do it. And until next time, keep on building.
AWS Podcast Episode #704: Scaling Your Relational Database on AWS – Are Relational Databases Cool Again?
Release Date: January 20, 2025
In Episode #704 of the AWS Podcast, titled "Scaling Your Relational Database on AWS – Are Relational Databases Cool Again?", hosts Lish and Josh Hart, a Principal Solutions Architect at AWS based in the UK, delve into the evolving landscape of relational databases. They explore the challenges of scaling relational databases and introduce innovative AWS solutions designed to address these issues. This comprehensive discussion is invaluable for developers and IT professionals seeking to optimize their database infrastructure on AWS.
The episode begins with Lish highlighting the enduring relevance of relational databases in modern applications, despite the rise of NoSQL databases. She shares her experience with DB2 mainframe databases, emphasizing their reliability and fit for numerous solutions. However, she acknowledges the inherent scalability issues that often prompt organizations to consider alternatives like NoSQL.
Lish (00:37):
"Relational databases are at the core of many applications and have been for many, many decades. They work and fit into a lot of solutions."
Josh concurs, pointing out that as applications grow, relational databases can become bottlenecks. He explains that simply adding more compute resources ("throwing more tin") is a short-term fix that eventually hits physical and practical limits.
Josh Hart (02:02):
"Very often, you see, the relational database can become the bottleneck and it can become the sticking point as you start to want to scale your application."
The conversation shifts to the two primary methods of scaling databases:
Lish (04:02):
"If we can't keep going big, let's say, in terms of a single node, what's stopping us from having lots of nodes like you mentioned auto scaling before, why don't we just autoscale the database?"
Josh elaborates on the complexities of each approach. Vertical scaling often requires downtime or failover during resource upgrades, making it inflexible for dynamic workloads. Horizontal scaling introduces application-level complexity through sharding, which necessitates managing multiple database instances and routing queries appropriately.
Josh Hart (05:12):
"Sharding is where we have to make the application aware that our database is sharded... that's complexity that we're adding to the application."
To address these challenges, AWS has developed solutions that simplify both vertical and horizontal scaling:
Aurora Serverless v2 allows automatic scaling of database resources in response to fluctuating workloads without requiring restarts or causing downtime. This solution is particularly beneficial for environments with variable usage patterns, such as development and pre-production.
Josh Hart (07:13):
"Aurora serverless V2... will automatically scale up and down in response to changing traffic patterns... seamlessly without having to restart the database process."
Key features include:
Lish (09:44):
"It will scale down more conservatively... you can begin to procure additional capacity... typically seconds."
Aurora Limitless Database, currently in preview, enhances horizontal scaling by abstracting the complexities of sharding. It introduces "shard groups," which automatically partition data across an expandable set of nodes based on predefined shard keys.
Josh Hart (12:08):
"Aurora Limitless Database... automatically manage the routing of the query to the correct storage node. So your application doesn't need to be sharding aware."
Benefits include:
Lish (13:41):
"That means we're taking away the undifferentiated heavy lifting of sharding your database. It's done and done robustly."
Beyond scaling, managing database connections is a critical aspect of database performance. Traditional relational databases can struggle with connection storms, which occur when applications rapidly open and close connections, leading to resource exhaustion.
Josh Hart (15:24):
"Each connection that we have to the database consumes some additional resource, some CPU, some memory... connection churn can have a real material overhead."
AWS offers two primary solutions to manage connections effectively:
RDS Proxy acts as an intermediary between applications and databases, controlling the number of active connections and enhancing database resilience.
Josh Hart (17:04):
"AWS solution there is called RDS Proxy... you have to manage the configuration of."
While effective, RDS Proxy requires provisioning and managing an additional layer, which can introduce complexity.
A more recent innovation, the RDS Data API, reimagines connection management by eliminating the need for persistent database connections. Instead, it leverages SDK calls to interact with the database, handling connection pooling internally.
Josh Hart (17:49):
"RDS Data API will manage the connection pooling for you as part of the service, and all you need to do is just send your SQL statement as an API call."
Advantages include:
Lish (19:11):
"I'm sending my SQL... but I'm not configuring ODBC drivers or JDBC drivers and all that sort of plumbing again, kind of abstracted away."
Lish and Josh emphasize the business benefits of these AWS solutions, highlighting reduced operational overhead and cost savings. By automating scaling and connection management, organizations can redirect their resources towards innovation rather than database maintenance.
Josh Hart (20:57):
"This is going to reduce the amount of operational overhead... you can focus on the innovation and the thing that really kind of differentiates you."
Lish (21:57):
"Look at your dev test environments... something like serverless, Aurora Serverless 2, you can have the size you need within reason to get the job done."
These advancements align with a common organizational goal: minimizing the time and resources spent on "keeping the lights on" in IT operations, thereby maximizing investment in innovative projects.
The episode wraps up with Lish and Josh reinforcing the importance of leveraging AWS's advanced database solutions to overcome traditional relational database challenges. By adopting Aurora Serverless v2, Aurora Limitless Database, and the RDS Data API, organizations can achieve scalable, efficient, and cost-effective database architectures that support both current and future application demands.
Lish (23:32):
"Thanks everyone for listening... Until next time, keep on building."
Key Takeaways:
For developers and IT professionals looking to optimize their relational databases on AWS, Episode #704 provides essential insights and actionable solutions to modernize and scale database infrastructures effectively.