![#697: [The Frugal Architect w/Werner Vogels] How PBS makes every penny count... For viewers like you — AWS Podcast cover](http://d3gih7jbfe3jlq.cloudfront.net/AWS-Podcast-Title-Art.jpg)
In 1969, Mr. Rogers went before the US Senate Commerce Committee to request funds to support the gro
Loading summary
Mike Norton
This is episode 697 of the AWS.
Werner Vogels
Podcast released on December 2, 2024.
Simon
Welcome to the official AWS podcast. G'day, Simon. Alicia here. And I'm happy to bring you a special series with Werner Vogels. Here's Werner to tell you all about it.
Werner Vogels
Thanks, Simon. Welcome to the Fugal Architect podcast where we dive into the journeys of technology leaders building cost aware, sustainable and modern architectures. These are longer form conversations where we explore these topics in depth and we hope you enjoy them.
Simon
Good to have you here. And we're joined by a very special guest. We're joined by Mike Norton and Mike is the VP of Cloud services and Operations at pbs. Mike, welcome to the program.
Mike Norton
Thank you for having me.
Simon
So good to have you here. I'm sure most of our listeners, myself included, have watched many PBS products over time, so it's nice to sort of be on the other side of the table here.
Mike Norton
It's good to be here.
Werner Vogels
Well, especially because, you know, there's some amazing products have come out of ppbs. And the amazing story is of course that you are a nonprofit organization, yet you know, you have a tight budget. So we're really looking forward to hearing your stories.
Simon
Absolutely.
Mike Norton
I'm happy to share it.
Simon
And big, big reach as well, which is, which is interesting. So maybe, maybe for those who don't know who PBS is or who maybe aren't clear of the mission of pbs, tell us a bit about that.
Mike Norton
Yeah, so you know, PBS, we are a nonprofit. We were founded in 1969 with a mission to educate, inspire and entertain. And you know, if you've ever seen, if you haven't seen it, I recommend watching Fred Rogers when he went before Congress and asked for the money to get this going.
Simon
Beseed.
Mike Norton
And honestly, I watch it every year because it is inspiring. It's why I do what I do and I think it's why we do what we do.
Simon
That's amazing. And what about your own role? So you've had an interesting journey with PBS and your own career. So maybe give us a little bit of a potted history of your career and talk to us about the work you've sort of entered into at PBS over the years.
Mike Norton
I started out as a developer. I loved Tool around. And I mean, my dad brought home the original Macintosh in 1985 because he was going to his doing his MBA and he got the budget to get a computer. And I tooled around on that thing for years with my brothers and it led me into tech. And so I started out As a developer, loved writing software. But I was never the kind of developer that liked the whole I write something and I throw it over the. The hedge to somebody else. This is back before we had cloud, back before we had terms like DevOps and Agile manifestos and any of that. I just knew that I'm writing this stuff. I should also have a hand in how it runs. All the, say, moved on to PBS and into operations. I spent some time at PBS as a contractor in 2009, which was just as we were starting to use AWS, went on to do some other things and then came back in an operations role. Led our ops team, which, yeah, I'm sure we'll get into this, but it was our firefighting. It was. We were the ones responsible for keeping things running. Took a brief time away from pbs, realized that I missed the mission and came back in an architectural and now running our cloud team.
Simon
Amazing. Amazing. It's interesting how many people start their careers diving really deep into the technology and that sort of, I think, maintains a passion for detail and optimizations and things you have to find. I think, Werner, we're seeing a trend here of inbuilt constraints into folks.
Werner Vogels
No, I also think that no matter how much if you would say you go up in the stack, eventually tweaking around with PDP nines or you know, tweaking around at the operating system level, eventually you get steps up, you look at the bigger picture. But I do think every day you miss it on one hand. On the other hand, it also gives you a much broader view of the whole stack that even if you're a domain expert, let's say in financial systems or something like that, you still understand the file cache, you still understand interrupt schemes and things like. Like that, which gives you a complete view. So I do think, you know, it's a sort of a natural progression over time, but it enriches you. But you also miss it all the.
Simon
Time until you're in the middle of it and frustrated with anything. You can't make anything. But maybe let's. Let's talk about that, Mark, because you touched on a topic that is close to many people who has listening hearts, which is not for necessarily good reasons, which is firefighting. Something's broken, it ain't working, we've got to fix it. And all industries experience different pressures, be they production schedules or a banking run, I would have to guess. And you can tell me broadcasting would seem to be an even more intense feeling here because you got audiences and programs and stuff. Tell us about, I guess, where PBS found itself in the firefighting phase and what that meant for you and what that looked like for your team at the time.
Mike Norton
Absolutely. Broadcast technology is a hard one because you can't control what people are going to watch and when they're going to watch it and if they're, you know, what is going to be a quiet hit. You know, Downton Abbey didn't pick up until, I want to say it was season two or three, that all of a sudden it was a thing. And I spent every January for several years not even getting to watch the darn show because I was sitting there at 9 o'clock on Sunday nights trying to make sure that everything was just running. And when it would end and say, to learn more, go to pbs.org masterpiece I was sitting there going, no, don't. It is hard. You know, we joke with the kids team that we probably know the average bedtime of the American child because we know when the traffic dies off on PBS Kids. But on the general audience side, our things like our dramas and our news programs and whatnot, things all of a sudden just become popular. We have some idea about what time people watch things and when things are starting to get popular. But we also have to deal as a technology team with the fact that the business side of the house is changing things. As Netflix starts releasing the idea of binge watching, we started changing, okay, if you are a donating member of a PBS station, you can now watch all these things, you know, the night it releases. That then affects the technology. When I first started at pbs, the streaming of various programs happened. The next day it would air on broadcast, and the next day it would be available on streaming. So actually, Monday nights were harder than Sundays for those dramas. And then they started releasing these things in binge packages and starting to try to make sure you could watch it at the exact same time that it was streaming over the air. And that changes the technology needs. We had a lot of thundering herds of traffic, sometimes prepared for, sometimes just out of the blue, it could be a politician mentioning something that happened to be on a NewsHour episode four years ago. And suddenly that's popular. And it might only be popular for a couple of days, a couple of minutes, but we have to be prepared for those sorts of shifts.
Simon
So it's fascinating, Verna, there. We're really talking about a shift in business requirement and customer requirement beyond technology here, aren't we, really, when we think about it?
Werner Vogels
Well, in some sense here, you have no control over the customer requirements I mean, the customer does whatever he or she wants to do. It's more like where is the business willing to put its money against which are the things that needs to be highly available all the time? Or which things may be on the back burner or maybe time shifting or things like that? Those should all be business decisions, not as technologists making the decisions for the business.
Mike Norton
Exactly. I have been asked from time to time, can you please predict what our CDN bill will be next month or for the next year? And I have to say, is Ken Burns releasing a documentary this year? Do we know what's going to be popular? I can tell you what our price per gigabyte streamed is. We can do some level of estimation about around if this is going to be popular. This will probably do that. But we've also had some cases where we've decided to make some changes around how we encode our video. And those have led to extreme cost savings because we figured out better encodings that are smaller file sizes but still quality content. And that throws off all your estimates. There is so much more than technology here. It's business decisions, it's user decisions.
Werner Vogels
By the way, I'm just curious, is H Vac so much more efficient than any of the other encodings?
Mike Norton
It is. I want to say we have halved our CDN bill.
Simon
Wow.
Mike Norton
By re encoding our popular content. It's insane. And it's thrown all of our estimates.
Simon
Off in a good way.
Mike Norton
That was both a technology and a business decision. And that's where I think, Werner, you were talking about this, that you've got your roots in the deep technology, but then there's just so much more to it. And it's that that I think is what I love about being in my role is that it's not just technology. I'm not doing technology for technology's sake. I'm sure there are plenty of people out there who, you know, love moving bits and bytes around and that's what gives them joy. But what I love is being able to see how I'm saving a company money. A company with a mission that doesn't. Doesn't have a lot of money. And so we just do the best we can with what we have.
Werner Vogels
Well, I also assume that old Sesame street episodes are not in 4k yet.
Mike Norton
No. And those are decisions we have to make around what content is the most convicting to see in 4K. Right. Like an episode of Nature or Nova or a drama. Maybe you want that in 4K. Something like an old kids show that came out 20 years ago. That's animated, doesn't need to be in 4K. And frankly, the parents who are streaming those shows over their cell phone in a doctor's office are probably happy that it's not in 4k for their data bill. It's all these things to weigh and decisions to make around what can we do with what we have and how do we make as good of an experience for the viewer as we can without trying to just do it? Because we can.
Simon
And it's interesting there. There's a couple of nuanced things you've touched on there, which is just because you can doesn't mean you should as a Neo, fight to broadcasting my default position, well, let's get it the best resolution we can, and everything should be 4k. And why wouldn't you do that? And yet, when you actually think about the situation, what you've articulated there is that there are actually many times where you don't want that, and that has a benefit. And then the other element is you've also then challenged almost a foundational concept, which is the encoding algorithms and mechanisms that you're using, which is that classic thing of if it ain't broke, don't fix it, just leave it, don't touch it. Yet you've challenged that. Help us think about that mindset a little bit more, about what sort of questions were you asking or what sort of things were coming up to get you to that point? Because you didn't just wake up in the morning and go, you know what we're going to do? We're going to change our codec. Because that sounds like a really easy project.
Mike Norton
No, that was not an easy project. And I think our mindset was, we've been doing this for a long time this way. We were based on Apple's original encoding letters, because that's what we did. You know, it is, again, the hard part of being a nonprofit in this space. We've got less employees across the company than I would guess a Netflix or an Amazon prime has in engineers thinking about these problems. And that's across the entire company. We did at one point realize, okay, we're spending a lot on our cdn. We're still using the original encoding letters. Let's just take a look. And we've got some tooling in place to look at what the user experience is being able to understand. You know, if people are making it past the, you know, we don't do ads, but we have, you know, our funding pods at the beginning. You know, this Is, you know, if you've watched any PBS drama, you've, you've probably, you know, thought about going on a river cruise on the Danube. But we have tooling in place to understand are people falling off at that point? Where are people's videos starting to lose quality and then come back? And so we use that tooling to then say, all right, well, let's just try, let's, let's see what happens if we use Media Convert to try a different coding. And so we started with a couple of, you know, just a video here, a video there. And we, but the key is we had that tooling in place to be able to then measure and say there's no change or it's actually even better for the user. And it's also then costing us like half as much. So why not? We had to make some decisions. Then we have a large back catalog and that back catalog is, like I said, sometimes popular, sometimes not. It's popular when it becomes popular, but we have a lot of local content. Not a lot of people are going and watching. You know, that eight year old clip from NewsHour or that 10 year old episode about breakfast places in Minnesota. But when they become popular, they become popular. But really most of our content is. It's the fresh stuff that's what's costing us. That's what people are watching. And so we started by saying, let's just change how we're encoding the stuff coming in. Let's leave the back catalog as is. And we then spent a lot of time going back through that back catalog and finding the things that are still popular. People still like to watch the entirety of Downton Abbey. So let's re encode that. But maybe we don't worry about that. You know, that one episode of NewsHour from eight years ago, it's not worth the one time cost to re encode it if it's only getting watched twice a month. We ran through Athena Queries on cloudfront logs out just out the wazoo trying to understand what are people watching. And that is hard work because it's not like you have a log entry that says somebody watched this show. If you understand streaming video, you've got playlist files and then 6 seconds chunks of video and you're trying to piece all that together and understand what were people watching. When is this worth the money? One time to reconvert for a long term savings.
Simon
It's like forensics.
Mike Norton
It is, it is. And it's the same thing you do with server sizes and you know elasticache cluster sizes. And it's trying to just understand what. What do we need and where can we treat these things, where can we spend our money to optimize? But where is it not worth it?
Simon
Yeah, yeah.
Mike Norton
And that's the huge thing, is realizing those places where it's just not worth the time. Yes. That thing is, you know, this old elasticache cluster that is, you know, we're probably spending too much money on it than we should, but the cost of optimizing it isn't worth it for that particular thing. Our money is better spent doing this.
Simon
Mike, one of the things you talked about earlier was the concept of thundering Herds. And for those listening who have not come across that wonderful experience, it's basically load that comes unexpectedly and often brings down the backend system. It's loading and then has that wonderful thing where as the system is trying to come up, more people keep hitting refresh and destroying your service and everyone has a really bad day. Now you have an awesome example of a thundering Herd and how you solved it. And it relates to one of my favorite PBS programs, actually, the Ken Burns Vietnam War program. Tell us about what happened when this illustrious producer produced an amazing program about a topic very close to not just the hearts of Americans, but people around the world for different perspectives. What happened?
Mike Norton
We got a lot of load all at the same time. It was rough. It was before we had entered into, really entered into our move to containers. And so we were using. I miss it dearly. But opsworks was a wonderful. It was a wonderful product at the time. And for those who don't know, it was Amazon's. It was a wonderful move from building servers by hand. It was. On our path to containers was opsworks when we first went into the cloud. Gosh, we were using Amazon before you guys had auto scaling. We had to just a couple of years ago move some things out of EC2 Classic because we had things that were running before there were VPCs. We were old school. And that was, in a sense, when we first started, AWS was a data center in the cloud for us. We hadn't yet moved to that idea that servers are cattle nut pets. And we had not yet moved to that point. And so opsworks was a step on that journey. But our servers it. By the time all of the recipes got applied to the base image, the traffic was already gone. In that scenario, what we should have done was probably have pre applied a bunch of those recipes and built an AMI that was baked, partially baked, somewhat baked.
Werner Vogels
I do think One of the cool things with opsworks was that it had time based scaling.
Mike Norton
Yes.
Werner Vogels
I think that's sort of in your particular case where basically all your images are still servers, not flexible or whatever. But the fact that you could say Ed, at 6:30 tonight, we need 10 more of those and you need to be ready by that time. And I think that sort of was one of the biggest wins of using opsworks.
Mike Norton
Yes, we absolutely. We had a combination of time based scaling because we knew at this time of day just scale up, you need some more servers for this thing. And that time of day was different for different things. Like for kids, scale up more in the morning when parents hand their kids their iPads so they can go back to sleep, you know, and for GA, it was do that at 5:00 at night. But you also then could. We then also had the scaling based on metrics because we knew we needed more servers available for load at, you know, from 5 until 9 Eastern. But you never knew when something was going to get popular. So also have metric space scaling, but it still couldn't scale fast enough.
Simon
Tell us about that scale because I mean you talked about the fact that you got to have metrics, you've got to be monitoring and with this particular situation you had the metrics like you were tracking memory and you were tracking cpu. You're doing the right things but you weren't necessarily getting the scaling you thought you were going to get. Help us understand what was going on and what sort of popped into your head.
Mike Norton
We had all the right things, but it then comes back to having some deep knowledge of how the systems work. It's one thing to say I can throw more, more money at this and scale up more servers and whatnot. There was stuff happening at the OS level that we weren't prepared for. We did a lot of work. I think I spent multiple nights in the office that week just trying to keep it going. And in the end it was containers that was the solution, was doing containers. And we, and we started our journey into containers with running our own EC2 hosts for them. And with ECS. I am not ashamed to say we are not a Kubernetes shop. I don't feel that we have that level of complexity that Kubernetes is necessary. ECS has been fine and has done the job. And especially when we moved to Fargate where we didn't have to manage underlying host servers, it's been a huge win. I mean we have cut costs on our compute dramatically and the fact that I can Then pay for Fargate with a savings plan means I don't have to be sitting around trying to divine what kinds of instances we need and are we reserving, you know, M5 larges or are we doing Cs? It's, that's just out of my hands. I pay for a savings plan and it's done.
Simon
It makes it nice and easy. And I think the fact that there's cost savings associated certainly helps. But I'm guessing that when you, after many of those late nights, walked in the office one day and said, hey team, let's do this thing called containers, the business side and your colleagues didn't all go, yes, Mike, that's what we've always dreamt of doing. Let's go ahead and do it. What did you have to like, how did you sell this? How did you explain it? How did you even get them to recognize this was something worth doing?
Mike Norton
It was, it was a job. I have been blessed with really good partners on the engineering team who, they see that we're suffering. My operations team is the one who gets the page. They're the ones who get the call at 10:00 at night, 11:00 at night, whatever time of night. You know, our engineers are, are not on that call tree, but they're the ones who we have to call in ultimately a lot of times because if it's a scaling problem or an AWS problem, my team can handle it. If it's a, the engineer just wrote a really, really bad, you know, function that is not optimized, then we have to call them in. What I think we were able to do was to help the engineering team understand I'm thinking about law one on the frugal architect. Cost is a non functional requirement, but guess what, so is performance, so is security. And that's, that's sort of been my mission to the, the product development team at large at PBS has been to help them understand my team, we're not ogres, but we are, we are going to be here to say, I know that the powers that be have said, please put this button in place and make it look like this. And that's wonderful stuff. But my team is there to make sure that it works, that it works under load, that it's secure, that it doesn't cost us an arm and a leg. And so my team is really the team of non functional requirements and that move to containers. We had teams that were already doing containers for development environments. And so what it really was, was a lot of work over time. To get ourselves on product roadmaps and to help the product teams understand this is a win. This is. I know it's not a flashy new feature, but being able to deploy 8 times a day or 20 times a day or a thousand times a day, that's important because it means that when you have a need come in, deployments are just literally committing code and the CI CD takes it on and it's deployed and we can roll back easily. That wasn't always the case. And so getting people the most work my team had to do was just helping teams get into Docker in general. And then from there we were able to say, okay, great, we'll take this the rest of the way, but let's then talk about task definitions and all the things in ecs. And it was a lot of just working together and communicating in that context.
Werner Vogels
Mike, if your team is responsible, let's say, for cost, performance and resilience, how do you make sure the engineers also, let's say, have that on their plate? Because it can't be just on your plate. I think it's a shared thing. So how do you make sure that everybody else has their notice in the same direction?
Mike Norton
At least once a year, I try to talk with the engineers and just show them their costs and, you know, just in like we have lunch and learn kind of things that we do engineering, you know, roundtables. And I'll show them, this is, this is our bill. And the number of times that I have had people's eyes just like, open up like dinner plate size when they're like, wait a minute, I had no idea. Another thing we've been doing is my team is the one, the team that's responsible for just looking at RIs and savings plans and whatnot. And for a long time we just, for reasons not having a lot of people, we were doing just renew the things that are already out there, and then we'd find ourselves six months later with a bunch of unused rise. Last year I asked my team, I said, have a conversation with each team before we renew our rise and just make sure that they understand what they're running and that they're okay with that. And I had to then have a conversation with our finance team and say, you're going to see some. A spike in on demand costs for a little bit. Because I need a few weeks to have my team go and sit down with each of the teams and make sure they're right sized. We had a fascinating example where there was a team that was running I want to say it was a 8xl i3 elasticache instance. Turns out they didn't need it. When we talked with them and showed them the metrics, all of a sudden somebody's, you know, a light bulb went off over somebody's head and they were like, hold on. I turned that up for a annual meeting. We where we were told, this cannot go down during this meeting with the stations two years ago. So we took, I want to say that was a $68,000 a year instance that was running and we downsized it 8x and then reserved it, which means I effectively got about a 15x savings by just having a conversation that took 20 minutes. And so it's those kinds of conversations because the beauty of the cloud is giving engineers the ability to not be blocked and to deploy things. But the curse of it is that they're going to lick their finger, put it up in the wind and go, I'm not going to get fired if this doesn't go down. So I'm going to just pick the thing that seems beefy enough to, to handle load. Getting better at observing it, getting better at understanding it to your point in time. And it's getting the teams to understand it. We're not there yet where I want to be. In a perfect world, I would have each team understand exactly what their product costs and be aware of it. And I've got teams that are more understanding of that and teams that are less understanding of that. And I've got teams that are actually putting quarterly, you know, spend two points of the sprint on cost and they will come back to me with a hundred dollars a week in savings and I'm going to give them the highest praise because it's something they're trying, which.
Simon
Shows they're paying attention.
Mike Norton
Yeah, it's crazy because you could have a team spend, you know, several cycles in a sprint and they find a hundred dollars in cost, and that's awesome. And then you have a team that goes and re encodes our content library and it saves us tens of thousands of dollars a month. But it's all good because it's all about understanding this isn't free.
Werner Vogels
Do you often see that? I mean, you must have a view on the container side of things if you're running things. Do you ever go back to a team and say, joe's set of containers look remarkably like yours, but yours is twice as expensive? Do you have any insights into kind of things like that?
Mike Norton
Yes. We have to be giving people the ability to own their own stuff, their Own destiny. And there's a lot of trade off that has to happen around. How much do we force and how much do we just say, yeah, you're all doing things your own way. But it's not my job necessarily to come in and say this is the way. It's trade offs. Right. It's. Everything is trade offs. I don't want my team to be the police, but I also don't want to have a Wild west scenario, which is when we moved to the cloud, it was Wild West. We were just doing whatever we wanted when we moved to AWS because we suddenly didn't have Big Brother. It.
Simon
You could move with speed. Yeah, yeah, yeah, you can move with speed. But let's, let's unpack that a little bit more. Cause it's this interesting dichotomy and I think your organization has worked really hard to get to sort of the right place in that. Which is. You don't want to be the department of. No, but not everyone can do everything. So you need these sort of, these guardrails. How do you think about guardrails? How did you implement those? What's, what's the approach there? Because I think that's super valuable for a lot of other organizations too.
Mike Norton
Sure. One of the things we're trying to do is we're trying to build terraform modules, for example. This is the PBS way we do S3 buckets. This is the. Make sure it's tagged. Right. It's, you know, you're not accidentally setting it up with public access, you know, so on and so forth. I haven't gone all the way down this route yet, but there is a scenario where my team could be building product for the engineers and looking at it in that way. I think thus far our attempts at that, a lot of it has been sort of a field of dreams. If you build it, they will come. And they don't always come.
Simon
Yeah.
Mike Norton
And so, you know, again, it comes back to the. How much do you lay down the law and how much do you guide and inspire people to do the right thing? In some ways it's like parenting. Like all you can do is try to instill values in your kids and then set them off and hope that that goes well versus I'm going to be the police here. And you can only do it this way and you can only do it that way. And so it's. A lot of it is just trying to make people aware, trying to make people partners in this venture. And the more we do that, I feel like the More people actually amaze me when all of a sudden they are coming to me saying, hey, I found this cost savings. I found this better way to do things. I mean, we had, we've had engineers who have built like whole deployment systems for, for applications that we didn't ask them to do that, we didn't tell them to do that, but they just realized there's commonalities across these various applications. Let's come up with a, you know, opinionated way of deploying applications. And the hard thing is being as small and scrappy as we are, is that then you then. Then have to look at those things eight years later and say, this isn't working anymore. The guys who built this don't work here. And it was never an actual product. It was just a skunk works thing that we built. Do we kill it? Do we improve it? Do we replace it? It's just constant change and constant optimization. And it doesn't all happen at once.
Werner Vogels
Yeah, it's often interesting to see if you don't really track every, in every detailed component of your site or something like that. We had features on Amazon.com that we turned off after two years and no customer ever complained about it. Did we know that? Well, you know, often the ownership in all of this is extremely important. You know, who owns that piece of, piece of software or who owns that particular service, and if nobody is there but it's just running by itself, those are good flags to start to look at it.
Mike Norton
Absolutely. I have had cases. There are people who will refer to me as the Grim Reaper, because I will find old websites that, yes, they're running, they look terrible. We have got sites that have a fax form to buy the VHS tapes.
Simon
That's awesome.
Mike Norton
And I have sometimes gone to teams and said, hey, so we've got this thing. I feel like we should probably kill it. It's. It's probably going to be a security problem. It's probably. It's. It's just running. And they'll be like, no, no, no, no. We've got, you know, all these people who are using it. And I'm like, all right, it cost us X amount of money per month. And they're like, kill it with fire. Fine.
Simon
It's gone.
Mike Norton
And it's being able to bring that kind of like, okay, I've looked at this. This is a problem. This is what it's costing us. It's bringing that data to the equation and that conversation that isn't always there. And maybe it's better at companies that are for profit and they have a better like sense of what is this costing us per user of our SaaS platform and so on and so forth. We don't have a lot of that. But I can bring to teams and say this thing, it's running on an old version of PHP and it's costing my team time and money to patch it and upgrade it and monitor it and all these things and then have that conversation is this like what value is? But that being said, we do all kinds of things just because it's the right thing, not because it's cost effective. And so that's that other balance there is. We have a mission to what we're doing. And so sometimes we're doing things because it's the right thing to do, not because it's cost effective. And so that's what I love about the role, is that I'm not just trying to be the guy shaving every.
Simon
Penny, just squeezing the dollars and cents all the time. Yeah.
Mike Norton
Yes.
Werner Vogels
Yeah. Plus probably everybody else has the same sense of mission. If you work at pbs, whether it's customer centricity versus you know, at what cost do you deliver it. But you know, if you're on a mission to be really customer centric, the bottom line actually doesn't really matter that terribly much because it's, it's all about what you do for your customers.
Mike Norton
Exactly, exactly. We are here to serve the American public and that means sometimes you do something that is. Doesn't necessarily make sense on paper, but you do it. But it also means being able to understand trade offs and realize this thing just isn't making sense anymore. I mean we used to have a game for kids called Cart Kingdom. You could build your go buggy and drive it around in this thing. And there was a point in time where that the grant that funded that had ended and we were still spending a lot of money running it and we had to make a hard decision that's not worth the money it's costing us to run. It's not necessarily furthering the mission where it needs to be. So we had, we have to make decisions about what we cut, what we keep. And what I love about it is that there is a mission behind it. And so it's not just dollars and cents.
Simon
It's not just arbitrary or capricious. There's values, there's processes. But it sounds as well here, Mike, that you're really helping the organization and your own part of the organization is trying to move from fire fighting to fireproofing. And it's a different discipline, isn't it? It's a different mindset and it's not obvious to folks straight away because often we've all come from firefighting rather than fireproofing.
Mike Norton
It's totally different. It requires thinking about things ahead of time. It requires getting into the conversation early. Too often my team will get called in a day before a launch and they don't even know where they're launching it. There's like, hey, I heard you're the guys to talk to you about getting this up in the cloud. And you know, tomorrow we're having to have some hard conversations about, okay, hold on, why did you build it this way? You know, we do things with containers. You're not doing that. And so, yeah, fire proofing, to my mind is about being in the conversation early, early, early. So that you can help guide those technical decisions that are made. Because again, going back to the beauty of the cloud is that everyone can make decisions as needed. But the curse of it is that everyone is making decisions as they need. And now, you know, you have a team that is dealing with the fallout of that. And so the more that my team can be involved early and talk through, hey, you know, you're choosing to use whatever the technology, you know, you're, you're, you're building this in rds. Does it really need to be a full database? Is this, you know, is this something that you could use with Dynamo? Is this, you know, those kinds of decisions made early on mean that you don't have all these sunk costs of developers and people just making decisions that they don't realize are going to be a problem. So yeah, it's that mentality shift around having those conversations early on. What I've been trying to help people understand at PBS is that it's all those non functional requirements. Yes. You've been asked to make a donate button that is blue and this many pixels by this many pixels and it's going to do X. But there are implications to what you're doing and please, please, please have my team involved from the beginning. We could know, you know what, we made a, we made a bad RI purchase last year and we've got a fleet of unused ris for M5 whatevers. So please just, let's just use those.
Simon
Yeah, yeah, yeah.
Mike Norton
But they don't know that because they're not that intimate with the bill every month.
Simon
It's a, it's that mind shift, mind shift change that really is important now, Mike, as we, as we come towards the end of Time here. There's a story I'd love you to tell because it's always fascinating to hear what goes on behind the curtain of things that we all see and almost take for granted. So one thing that I'm sure most of our listeners are familiar with, there's a little doodle on the Google website that will change from day to day depending on events and things going on. And you had an interaction with the doodle. And the doodle is more than an image. Tell us about that.
Mike Norton
We got selected for Google Doodle back in the day and we just. Our on prem infrastructure was not prepared for it and not in the slightest. And it was to the point where I want to say from the stories I've heard, there were people at PBS who were just saying, no, we can't be featured on the Google doodle. Wow. And the decision was made, move that website to AWS and problem solved. And it was, it's the same story for how we ended up in streaming in AWS was that the first streaming video app at PBS was, we didn't even have an engineering team at that time was delivered to us and you could have exactly two people watching a video per server. And our IT department was like, no, we cannot give you a hundred servers to handle your load. And so we moved to AWS again. That was back before there was even auto scaling in aws. So we were, we had people like writing scripts on cron jobs to scale things up at certain times and scale things down. But it made things work. And what it did was it, it meant that we didn't have those. No, you can't do this because the hardware is not there or our, our network can't handle it. We can just pivot and make this work. And I know from the streaming side of things that meant we were spending a lot of money on aws instances from 5 till 9 o'clock at night or whatever. But it gave us time to fix the application to make it actually perform properly. And then we didn't need a hundred servers at one time. We could have a lot less to handle all that load. And the same thing with the Google doodle, like you want to have that kind of presence and we want to take the opportunity.
Simon
Right.
Mike Norton
Those opportunities are important. And so being able to pivot and even if it means that this is not the best way to do it long term, we can get some breathing room to fix it and then do it the right way. In order to run this thing, you have to actually rack some servers and run Them that doesn't work. But in a world where you can say, you know what, we can handle the extra cost for the 48 hours it takes us to get this to better, that is the best, the best part.
Simon
It's interesting too, hearing about sort of those past stories. I guess, Verna, from your perspective, and from mine is a bit of an older hand, like, this is the great example of this feedback we're constantly getting from customers saying, it would be great if my server could turn on at this time and turn off at that time, et cetera. These are these constant signals that have always come through and it's so gratifying to see customers have this in their hands now.
Werner Vogels
Yeah, there was a great balance to be had there. Yeah. So, yeah, it's. And it's, it's difficult. There's. There's always conflicting kind of, kind of opinions. The other thing, of course, is that when you guys first start moving to the cloud, it is still 2008, something like that. 2009, I believe. These days, the expectations for any digital service are that it's always up and it's always performant and every content you can get anywhere in the world where you are. The expectations have changed dramatically in the past 10, 15 years. And that puts a lot of stress on, I think, on every engineering and every operational organization as well, because what we expect of a website in 1999 was that it would be offline half of the time or you wouldn't be surprised. Now this is something that we do not expect anymore. And you'll get a call from your board or from your boss if things are offline for five minutes. So expectations have changed dramatically, I think, over the past 15 years, which makes us engineers think very differently about engineering and about cost, because those expectations come at a cost.
Simon
It's very true. And you kind of have that construction icon that we used to have in the old days on the websites.
Mike Norton
Exactly. And those decisions are the hard ones because again, it's not up to a technical team to define uptime. That's a business requirement. And I have at times been asked, how do we make sure that we're, you know, fully, highly available? Like, well, I can double our costs. I can put it. I can put it in two regions. Yeah, but I don't think you need that because I'm generally speaking being across AZ. I mean, we're in US East 1. Like, we're old school. We are in US East 1 and we're across Aziz, and it's fine, it's good Enough until somebody tells me, you cannot absolutely ever be down at any minute, then I will go and I will build that. And that won't be all of our infrastructure. That will be the things that you need me to. And so you know, our streaming video is S3, fronted by Cloudfront. That's always up. It's already multi region, multipop. It just works. I don't have to worry about that. But until somebody comes and tells me you must absolutely 100% be able to bring a new video from our partners on Friday afternoon, this has to be up. Then I'll have to make some decisions and present them data and say, all right, I can do that, but that means that I have to do these things.
Simon
This is a trade off.
Mike Norton
And you want me to use global load balancers and all these things? Sure, I can do that. But a lot of what my team is doing is trying to help communicate. This is up all the time? For the most part, yes, it's down sometimes. How much does that matter? What do you want us to do? Here's what it will cost to do it and then have those conversations.
Simon
Makes sense. Makes sense. Now we're talking frugality. We've talked about the mission. Mike, there is a dollar that was spent with PBS that I think you should tell us about because it's pretty remarkable and probably helps us understand why you have clearly such passion for the work that you do.
Mike Norton
Yeah, we had a little boy named Noah who liked our shows and he sent us a dollar in the mail and said, I would love to have a show. I think it was called Superheroes to the Rescue. And the kids team got that and they turned around a quick website and sent it back. Hey, go check out this URL. And still up and running today. PBSKids.org Superheroes to the rescue.
Simon
Wow.
Mike Norton
And it has a dear Noah, thank you for, you know, sending us this dollar. You know, here's what we could do. And that's what it's all about, is knowing that we are inspiring these people and inspiring children and educating adults and children. It's that mission that is, I think, what drives all of the people working at pbs. It's being able to know that we're doing this not for an extra buck, not there's anything wrong with that, but we're doing it because we're making a difference and it really comes down to the viewers like you.
Simon
That's so awesome. So awesome. Mike, thanks so much for sharing a bit more about your story and the PBS story as well today.
Mike Norton
Thank you for having me. This is awesome.
Simon
And Werner, as always, fascinated to hear all your stories and insights as well.
Werner Vogels
Well, I think in this particular case, I should really point to Mr. Rogers testimony before conquest, because I think if you want to have some dust in your eyes, I think that's a pretty good one to watch.
Simon
Yeah. Very dusty rooms. Very dusty rooms.
Mike Norton
Absolutely.
Simon
And thanks everyone for listening. We do love to get your feedback. AWS PodcastAmazon.com is the place to do it. And until next time, keep on building.
AWS Podcast - Episode #697: "The Frugal Architect with Werner Vogels – How PBS Makes Every Penny Count... For Viewers Like You"
Release Date: December 2, 2024
In Episode #697 of the AWS Podcast, hosted by Simon Elisha, listeners are treated to an insightful conversation featuring Werner Vogels, Chief Technology Officer at Amazon Web Services, and Mike Norton, Vice President of Cloud Services and Operations at PBS. This episode delves into how PBS, a nonprofit organization, strategically manages its technology infrastructure to maximize efficiency and uphold its mission of educating, inspiring, and entertaining the American public.
Mike Norton shares his passion for technology, tracing his roots back to tinkering with the original Macintosh in 1985. His career trajectory led him to PBS in 2009 when the organization began adopting AWS. Initially working as a contractor, Mike transitioned into an operations role, eventually leading the operations team and now heading the cloud team.
Mike Norton [02:12]: "I started out as a developer. I loved Tool around... I just knew that I'm writing this stuff. I should also have a hand in how it runs."
Broadcast technology presents unique challenges due to unpredictable viewer behaviors and sudden spikes in demand. Mike recounts the intense periods during high-profile broadcasts, such as when popular shows like "Downton Abbey" gain unexpected traction. Managing these "thundering herds" of traffic required PBS to adapt rapidly, often leading to late nights ensuring systems remained operational.
Mike Norton [05:41]: "We started changing our encoding methods, which led to extreme cost savings because we figured out better encodings that are smaller file sizes but still quality content."
Transitioning to AWS allowed PBS to handle sudden increases in traffic more gracefully. Initially using OpsWorks, PBS faced limitations in scaling rapidly during peak times. The move to containers and Amazon ECS (Elastic Container Service) with Fargate revolutionized their operations, enabling automatic scaling and significant cost reductions.
Mike Norton [21:21]: "We had to make some decisions about how much to scale up and where to optimize costs without compromising performance."
A pivotal strategy for PBS's cost management was revisiting their video encoding methods. By adopting more efficient encoding algorithms, PBS halved their CDN (Content Delivery Network) bills without sacrificing video quality. This move was not only a technical achievement but also a strategic business decision, underlining the importance of cross-functional collaboration between technology and business teams.
Mike Norton [10:17]: "By re-encoding our popular content, we effectively saved about 15x in costs just by making informed decisions."
Mike emphasizes the cultural and operational shift within PBS from reactive firefighting to proactive fireproofing. This involved early engagement of the cloud team in the development process to guide technical decisions, ensuring scalability, cost-efficiency, and resilience from the outset. Establishing guardrails and promoting cost-awareness among engineering teams were crucial steps in this transformation.
Mike Norton [38:43]: "Fireproofing, to my mind, is about being in the conversation early... to help guide those technical decisions that are made."
To maintain control over the cloud environment, PBS began developing Terraform modules that standardize configurations, such as S3 bucket setups with proper tagging and access controls. This approach balances autonomy with governance, ensuring that engineering teams can innovate while adhering to organizational standards.
Mike Norton [31:55]: "We're trying to build Terraform modules... make sure it's tagged... so you're not accidentally setting it up with public access."
A standout moment in the discussion was PBS’s feature on a Google Doodle, which exposed their on-premises infrastructure’s limitations. This incident underscored the necessity of cloud adoption, prompting PBS to migrate to AWS swiftly to handle sudden increases in traffic without compromising service quality.
Mike Norton [41:43]: "We moved that website to AWS and problem solved. It allowed us to pivot and make things work under unexpected loads."
Throughout the episode, Mike underscores that PBS's decisions are deeply rooted in their mission rather than purely financial considerations. While cost-efficiency is critical, PBS prioritizes delivering valuable educational and entertaining content to viewers, sometimes making choices that align more closely with their mission than immediate cost savings.
Mike Norton [36:55]: "We are doing things because it's the right thing to do, not because it's cost-effective."
The conversation highlights the delicate balance PBS maintains between cost, performance, and resilience. By leveraging AWS's scalable infrastructure and fostering a culture of cost-awareness, PBS ensures that they can meet the evolving demands of their audience while staying true to their mission and managing their budget effectively.
Werner Vogels [09:02]: "These are business decisions, not technologists making the decisions for the business."
Episode #697 of the AWS Podcast paints a compelling picture of how a mission-driven nonprofit like PBS navigates the complexities of cloud infrastructure. Through strategic adoption of AWS services, cost optimization, and fostering a culture of proactive engineering, PBS exemplifies frugal architecture that maximizes every penny spent while delivering exceptional value to its viewers.
Mike Norton [48:15]: "We are here to serve the American public... knowing that we're inspiring these people and educating adults and children."
This episode serves as an invaluable resource for developers, IT professionals, and organizations aiming to optimize their cloud infrastructure while staying true to their core mission.