A (17:12)
The story is so like, unfortunate that I feel like I just need to say it out loud. Scott put everything on the line. He used his savings, his mom's inheritance that she got from selling her business, and he even took a mortgage on his house, all to invest in this business, which is just something people do to start a business. Fair enough. But now he's in handcuffs and he thought maybe the court would sort things out for him. But in reality, he was up against one of the oldest and most powerful institutions in the uk. The odds were stacked against him. They had their own prosecution branch. And from here, things only got worse. I get it. I'd be hopeful if I were Scott. I'd probably think, you know, what's the worst that could happen? Maybe I'll have to sell the store, but at least I can move on from this nightmare. I thought this was going to be my job, but now I just want to get out of it. But in fact, it didn't work out that way. It got much, much worse. But what had just happened? Where was this money actually going? Let's start with the stamps. The first big blow that Scott took. Imagine it's 2008. It's the end of the day. The receipt printer is still warm. There's a line of rubber bands on the counter from where Scott's been bundling up the stamps. This is the end of day, tidy up time, count out the stamps in the drawer, tell the computer what you got. Make sure all the numbers all line up on the screen. It's a simple form. You know the amount of stamps in. He scans the tray, he taps enter. The screen hesitates. Maybe the computer freezes. Maybe it does that sometimes. So he hits enter again, because that's just what you do. But here's what I think happened. The system would freeze and then it would repeat. It would play back your key presses so that one stock in entry turns into two. The screen didn't warn him the screen was frozen. It just added another batch of stamps to his ledger. So that means in the drawer, there's what he actually unpacked, there's today's stamps, but in the computer there's today's twice. What that means is nights close or the next day. They think he's holding more stamps than he does, more than physically exist. They. It's not missing cash exactly. It's just ghost stamps that the system insists should be there or the money for them. And that's how you wake up owning 1750 in stamps that you've never seen. It's not that Scott skimmed anything. It's because the counter said hit enter, and he hit it and the screen hiccuped and the software created a version of the world that didn't actually exist. Now, those bigger discrepancies, I have some theories too, for how they could start. It's the same type of glitch, just on a larger scale. If a builder brings in 2,500 pounds and Scott enters it in and the cursor hesitates, so he hits enter again. It's a busy store. He's got to move on. One real deposit now becomes two entries. And then an hour later, cafe owner comes in with fifteen hundred dollars to deposit. Same pause, same double enter. And no one would notice this in the rush. The cash is there, it's counted and it's real. But at closing, it says that there's £4,000 more cash in than is actually in the drawer. It's just two ghost deposits. Right? It's not theft, it's just buggy software. It's not innovative. There's a system of checks and balances that should be in place. But the reason things failed here has to do with how this software was built. In the 90s, the UK government set out to modernize every Post Office counter. They wanted to get rid of old paper benefit books and they wanted to switch to a card system. So they bought in this company called ICL Pathway to handle both jobs. They're going to put I, you know, a computerized point of sale system in every branch, in every Post Office store, and they're going to move all their benefits payments online. There's two pieces to the system, the Post Office and the benefits. The benefits part gets cut. The whole thing doesn't go well. There's delays, there's fights over cost, there's changing requirements, but somehow the counter system survives. And that's the system that's running Windows nt. At Scott's office, the project is Seen as a huge failure. But they can save this post office part and maybe things will be better. Newspapers write up stories about all the wasted money of this project, but it still rolls out. And even without the benefit cards, putting computers on every counter still feels like progress. It sounds like a sunk cost problem. They put all this money into this failed project and surely they can save some of it by rolling out the small piece of it. And because this was built in the 90s, you know, it has dial up modems and it has unreliable connections and thousands of tiny shops that need to communicate to home base. So the system was built offline first. Every branch got their Windows NT box and it was hooked up to scales and a barcode scanner and a receipt printer and a messaging layer that was called repost. So if the network went down, you could still serve customers. Transactions just queued up locally and then when the Internet was back, the data was synced for the time. That was a smart trade off. The Internet was not reliable, but it is a trade off. When you have this sort of store and forward system, your truth comes from a pile of queued messages on various machines. And they can get delayed and they can get retried and they can even get replayed. These are just the problems of a distributed system. Most days everything works fine and the ledger looks clean. But every so often, maybe it doesn't work out. Most days you never notice any of this. You sell stamps, you pay out pensions, you take deposits. The cash drawer has the money in it, the terminal has its numbers. And at the end of the day, those two sets of records are supposed to match up. But when they don't, when you're left staring at two realities. What's in the tilt and what's in the screen. How do you reconcile that? You might think like Scott did, that the numbers have to balance out eventually. If a deposit got doubled somewhere, someone should end up with twice the money in their account and that should be flagged. There should be discrepancies that show up somewhere. Double entry accounting is supposed to catch these things. You can't actually just create money out of nowhere. But I actually looked into this. While the ledger system that tracked what Scott made and owed each day was offline first, the banking transactions were live in real time. There are real time communications with the bank. So it's very possible the money was deposited once. But because of a double press or because of a network hiccup, there was two records in Scott's system for it. And somewhere these numbers must get reconciled. The Money transferred into somebody's account, you know, should line up with these aggregate of data across all these post offices. So in fact, somewhere it should all shake out and even out. But not in any place or on any timeline that actually helped Scott. Most days, the software worked fine, but there, it turns out, were plenty of known bugs, enough to cause real mistakes. And behind the scenes, people at Fujitsu were scrambling to keep things running. They were patching issues, they were finding ways to update the ledger, forcing the numbers to add up and be correct. But Scott didn't know any of that. All he saw was the computer telling him that he should have money that he did not in fact have. And then the auditors seeing the same numbers and jumping to their own conclusions. Hey, here's a small town guy who's stealing from us. Let's make an example of him. If you're from the uk, you might have heard parts of the story before, maybe not about Scott, but about the 13 postmasters who took their own lives after facing similar accusations. Scott didn't take that way out, but his life was definitely turned upside down by all of this. And we'll get into that. But what about the software itself and the people who maintain it? How could an organization that took this failed software project and push it out and was constantly fighting bugs and drowning in heirs turn around and aggressively prosecute people who were affected by those bugs? Let's rewind. After Horizon was created, but before it got Scott put into handcuffs, before it got him splashed across papers as a thief, the company who created it was acquired by Fujitsu, and so Fujitsu held the maintenance contract for the software. Scott had no idea. But Fujitsu engineers had already had a name for a bug that seemed a lot like the one that was draining his account. They called it Calendar Square, after a Falkirk shopping center where they first spotted it in September 15, 2005, a sub postmaster at the Calendar Square post office tried to move stock from one counter to the safe, but the transaction just seemingly disappeared. Wanting the books to balance, he tried it again. But what he didn't know is Horizon's repost messaging layer had frozen. It had a message timeout waiting for lock. And when the terminal was restarted, when the lock was finally cleared, it replayed that queued message. Suddenly, both versions of the transfer showed up. Two transfers in for a single transfer, out in double entry bookkeeping, which I'll touch on at some point. For every transaction, there is both an in and an outside, and this is a careful check on things. But on paper, this Branch suddenly had a surplus in one account without a matching shortfall in the other. And because of that, the operator, the sub postmaster, was on the hook to repay the difference. Fujitsu logged this failure as peak PC12642, and a few days later it happened again. And then it was given a different number and both incidents landed in their internal error logs. So they put the incidents in their known error logs and they gave advice to the support people at the Post Office. If somebody reports this problem, tell them to reboot the machine and whatever they do, don't enter it again. Internal emails at Fujitsu admitted that this lock bug had been showing up at a number of sites, most weeks going back as far as 2000. But the subpostmasters were never warned. Fujitsu just kept the known error log to themselves. So if this is what happened to Scott, and if he managed to reach the Post Office before a restart or whatever occurred to get the double posting, the staff there wouldn't necessarily know what to tell him. But it's wild that the folks running the Horizon system already knew this bug inside and out by the time it happened to him. But for the actual sub postmasters dealing with this, they were kept totally in the dark. And that was just one of the issues, right? There was another one called the reming out problem. Reming out being short for remitting out. The end of the day routine, where you've got too much cash on hand and, and you seal the extra money in pouches, log it into the system and then a van comes and picks it up. Basically, you're moving money from cash on premises to cash on transit, right? You don't want too much around so that you don't get robbed. You can imagine an end of day Scott on a busy pension day has too much cash in hand. So he follows the routine. He prepares. These pouches each have £10,000 in them and 20 notes. Each bag gets sealed and it has a barcode on it. And in Horizon, he's supposed to enter that he has this 10,000 pound bag and then he has the second 10,000 pound bag and it should subtract 20,000 pounds from the branch's holdings and add 20,000 to the pouches ready for collection. But this reming out bug, which sounds really bad, if you did two bags and they had the same amount in them, Horizon only subtracted the first one from the branch's holdings, even though both bags showed up going into the van. In other words, the van would get their $20,000, but the branch would Say it had only taken out 10,000. When people talk about balancing the books, this is what they're talking about. Both sides need to match. You can't take out 10,000 here and deposit 20,000 over there. It doesn't make any sense. But that was the bug. Both bags left the branch, both were in the van, but the system acted like only one had gone. And so on paper, it looked like at the end of the day there was £10,000 of cash missing. It's like the ghost stamps, only this time the numbers are much bigger. Yeah, that's the reason double entry accounting exists. Every transaction gets recorded twice. Once as a debit in one account and once as a credit in another. And those two need to balance. If they don't, you've either created or destroyed money out of thin air. And this isn't a new idea, right? The idea of recording everything twice goes back to merchants in Renaissance Italy in the 1400s. They were using double entry bookkeeping. If you've ever written code, double entry accounting might feel familiar, right? It's basically a 15th century version of like a two phase commit. You can't close the books until both sides acknowledge the change has happened. You have like two physical machines separated on a network and you're taking something from one and moving it to the other. Both sides need to confirm that they've gotten that change or. Or it didn't actually happen. If one side never acknowledges, or if things just hang, then it doesn't count. It's also kind of like test driven development, right? Every code change needs a matching test. One side needs to match the other. If the logic in the test or the logic in the code you added is incorrect, something will fail. And that's a sign you need to figure out what's going on. There's so many metaphors for this. The other way to think of it is like a checksum, right? If a checksum doesn't pass, then the data's corrupted. But really the system should not allow you to have a debit in one account that doesn't match a credit in another. It's just a simple integrity check. And instead of investigating and blaming the system for breaking basic accounting rules, somehow the finger gets pointed back at the subpostmaster. That's the reming out bug. And in February 2007, Fujitsu reviewed this bug and they found internal notes showing 49 branches were hit in that month. And maybe because this one is obvious and doesn't balance, they did remotely access some of these branches machines and try to fix up the ledger entries. We don't know if Scott's branch was one of them. We don't know if this was the bug he hit. The details just aren't available. What we know is that in some cases, Fujitsu was working behind the scenes to try to correct these errors without telling the contractors like Scott, or even telling the post office itself what was going on. And there was so many bugs like this. There was an earlier bug from May 2005 called the Kell G Maxwell 38 5P. All we know about that bug is it says possible bug encounter code. But they were never able to pin down what happened. There was never a change. We don't know which post offices were affected, and we don't know what the fallout is because we're piecing this together after the fact. And there were plenty of other issues. And honestly, we'll never know what really happened to Scott because no one bothered to look. The problem was there's so many layers. There was the software company doing the maintenance fixes. They built the software. They don't want to talk about bugs. There's the support people at the post office and they're overwhelmed. And that's why the first time, the numbers didn't add up. Scott did what anyone would do. He called for help, and he gets the cue music. And then he gets a unsympathetic support worker who's working through a script. Check the till, recount the stamps, maybe power cycle things. Have you tried closing the session and then reopening it? It's hard to say whether the agent is even really listening or just working through a script. One thing's for certain, right? He reminds Scott. The contract says that the branch must balance to roll forward. If Scott can't fix it, the difference comes out of his pay. It's in your contract, sir. You can spread it over two deductions if that's makes it easier. Scott hangs up, feeling small. Not just that they've taken money out of his pocket, but that they don't trust him. If the computer says the stamps are there, then they're there. If they're missing from his drawer, then that's on him. So he pays, right? In that first case, he pays that 1750 and he tells himself it's just a glitch and it'll work out, and that's fine. This is his business. He's excited. But then, yeah, a few days later, the numbers don't add up again and the gap's even bigger. I'm just playing this back in My mind right. If he admits the shortfall, then they'll take the money right away. And maybe he won't be able to make payroll, or maybe he won't be able to pay the lease. So he's a businessman. He does what he needs to do to keep the business running. He forces the period to roll over. He tells the system that he has the money. He tells them what it wants to hear, just to make it through the night, make it to the next day. And that desperate entry to move forward is what was later called false accounting. That's what got him put in handcuff. That's the moment where the system of prosecution decided that he was the villain and he was someone to blame. But here's what's interesting to me. Right behind that maze that Scott couldn't see, there was real experts, the ones who could spot a software bug. They were just hidden inside Fujitsu's back office and they were trying to fix things. Maybe they were working very hard. You know, they had a list of known errors, but those never made it out. And if the problem you had looked like something in their error logs, support might notify them, maybe it would quietly get fixed, I don't know. But if it didn't, or if no one checked, then you're stuck. And Fujitsu was swamped with these bugs, but they also kept them under wraps. This list of known errors, they kept that as an internal list. They never shared it with the Post Office support at all. So it's not just a software thing. It's about organizations and culture. The Post Office treated every shortfall as a personal debt against the subpostmasters. You either had to pay up or they took the money from you. And it seemed like there was some sort of quiet disdain where this big institution looked down on these village shopkeepers, like people in charge in London while the subpostmasters are working in their villages. But there was, at least in theory, another option. If Scott had known the right phrases, and if he was willing to lose pay and to not just forcibly roll it over, he could have refused to roll over the period. He could have stood his ground not entering anything, but not accepting their numbers, I don't know what would have happened then. But what I'm imagining is maybe he eats the cost on that first time, but the second time he goes all forensic accounting on them. He starts writing down every transaction. He starts taking screenshots who pressed what, what happened, where he starts a formal dispute process with them, says that he wants to report a system defect, is very clear about his words and is very demanding of an audit before any penny is taken from his pay. If he had known that the software had so many issues, I mean, which of course he didn't. And if he had taken the time, maybe he could have shown them, or maybe not, but maybe he could have written to his mp, maybe he could have got a lawyer to send them a letter and maybe, just maybe, that would have pushed them to look into it, to get off the script. And then somebody would stop saying, it's your problem if the numbers don't balance, you just need to pay. If he could get people's attention like that, maybe he could get the issue escalated. Maybe then Fujitsu would have stepped in, they would have taken a real look, maybe they would have straightened things out. I do think it's possible. But think about what this really means. Scott's got to operate this business and all of a sudden now you. He's got to be a legal expert, be a forensic accountant, be a site reliability engineer, and use some sort of bureaucratic kung fu to get people's attention while customers wait in line and want to get their pensions or want to get their packages. He's supposed to risk his payroll and his reputation and all this hope on the fact that he could make some change happen. And there were 14,000 sub postmasters and many of them were having problems. So it's a lot to get above the noise. And when you're contacting the support line and their job is to get you off the line and move on to the next one. And Scott didn't have a map to all of this. He didn't know that all this was going on. All he had was this useless support line and his lease payment and this cash drawer that never matched the numbers on his screen and these people telling him he had to pay. And so the next time, he just entered what the computer wanted him to so that he could open up his shop and he could do his business. It's a choice that's completely human and totally understandable and one that would get him arrested. I think it's interesting. Sad, but interesting. How you can look at the details of this and see how it ended up where it did. Horizon is a textbook case of how big software projects go wrong. Yes, the goal was to modernize every post office counter and replace benefit books with a payment card. But government projects like this have a bad track record. The bigger the project, the lower the chance of success. And this project was one of the largest IT contracts in European history. As I mentioned on paper this project was supposed to do two things, the welfare payments and computerize the accounting. But that involved two different agencies and two different sets of requirements. And it all was in one contract and it went sideways. Patrick McKenzie, patio 11. He's covered stories like this before. He says government software projects fail for pretty predictable reasons. He says all systems reflect the culture they are created in. No system of importance can be accurately described without the context of the culture that created it. In other words, the institutions and the culture of how things are done are the hard part of government software, not the technical details. Because maybe they could have straightened out the technical details, but things were already a tangle. There was overlapping institutions and there was conflicting incentives between the software company and the subcontractors and the post office. And everybody had a contract and everybody was working to contract. And when you build software to contract, you get something that hits the checkboxes, that has the process but maybe is not working. The problem is that government procurement processes don't reward working software. They reward compliance and following the RFP and audibility. It's like ordering a car with a parts list. You can check every box for all the pieces of a car, but end up with something that doesn't actually get you anywhere. And then because institutions hate admitting failure, they basically can never admit failure. The easiest path to salvation when the welfare project failed was just turning this whole thing into the horizon postmaster system. This software project is tragic, but it's also kind of fascinating. There's just so many things that went wrong and I can't possibly go over everything that went wrong here. And as Patty11 said, it's more cultural than being a specific person who made a specific error in a specific place. But for one interesting example, imagine you're going to roll out this system. It's a nationwide offline, first point of sale system with active users across every small village and major city in all of the uk. In other words, it's a lot, right? And there are a lot of ways to roll out a system like this. If you're forcing the use of a failed project to save face, you should consider maybe rolling it out piecemeal, doing a canary deploy of some sort, or doing some sort of gated rollout. Try to use the software in a small number of Post Office owned stores. Keep a very close eye on it in small numbers like that. Maybe just one slow store to start, but really spend time and make sure each issue is resolved and investigated. There was actually 115 post office stores that were owned and operated by the Crown. And so that is a feasible plan. Just do those 115, investigate every problem, maybe run the system side by side with the old system and see how it lines up. That's something I would suggest. But the institutional reality of large government organizations pushes projects of this scale towards big bangs. The software is done. We had checklists, and all the checks have been checked. No one is going to raise their hand to say, oh, actually there's this problem over here. So when rollout started in 2000, when these horizon terminals, when these Windows NT boxes were bolted to scales and given barcode scanners and receipt printers, it was rolled out to all 14,000 village post offices all at once. And I'm assuming because before rollout, we decided that the software was correct and perfect for the task. I'm using air quotes here, but you probably can't see them. But because we decided it was correct and perfect, except for some known issues that actually Fujitsu is keeping to themselves. There's a simple rule, right? If your books don't balance, it's because of you and not the software. And so you must pay the difference. And that's why, as Scott was taking over his post office, Shop Horizon had all these failure conditions perfectly lined up. All the things that Patio 11 warns about. So by early 2000, as Scott was taking over his village shop, Horizon had all, you know, the hallmarks that patio 11 warns about. A contract that's optimized for process over getting the right outcome. Lots of people who can veto things, but yet no single accountable owner. An architecture that amplifies small glitches into accounting discrepancies, and an institution that's unwilling to admit that there might be faults. So when those glitches hit, the entire weight of the institution tilted towards prosecuting the subpostmasters. Because to admit otherwise would be to admit that the project itself was a failure, right? That so much money was gone that it was wasted. Or as Patrick McKenzie puts it, risk rolls downhill. If a ledger goes wrong, the people with the least power end up holding the bag because they can't prove who's at fault. That's the interesting thing to me. When Scott picked up the phone line for help, he didn't reach the people who actually built Horizon. He got the post office support. And their real job wasn't to escalate bugs. Their job was to keep things from ever reaching Fujitsu because of contracts and processes, right? Because sending things up to Fujitsu had a cost and it had all the overhead and painful machinery of a Giant government vendor relationship. So the defaults were simple, right? If Verizon glitched, that was Scott's problem. If the till didn't balance, he had to make it good. Small issues never became system bugs, they became debts. Because nobody else wants to admit there's a problem. Fujitsu wasn't gonna eat the risk, the post office wasn't gonna eat the risk. So all the risk just rolled down onto Scott. And that's why things did not go well for Scott. Right? He ended up in handcuffs and he was put in front of a judge in court.