Transcript
Adam Gordon Bell (0:02)
Welcome to Co Recursive. I'm Adam Gordon Bell. Today's episode is the story of a piece of software being built. So my wife and I, we often have this debate about people and their character. Sometimes it's high level, like, can you separate Michael Jackson's music from what he did? Is Billie Jean a great song or is it tainted? But usually it's more specific. Sometimes people give her bad vibes and I'm not always a good judge of character. I give people the benefit of the doubt. Sometimes she knows right away what she thinks about somebody and she'll know that something is off and she's not always right. But I mean, she's not gonna listen to this. So, yeah, actually she's always right. It just usually takes me time to realize what she's talking. But anyways, this is going somewhere, I swear. So it's late January 2024 and I'm at my desk following one link to another. I end up on the Linux kernel mailing list, the lkml. Right. Most of it is actually patches. There's like an email with a title merge tag timers v6, then a message, happy New Year. Please consider pulling these changes. And then it's a literal patch, like a diff plus this line of code, minus this line of code. And you can apply these patches to your own Linux source if you so desired, using Git Apply. It's like a pull request workflow. But yeah, that's not the link I get sent. The link I follow starts like this.
Hans Reiser (1:36)
I was asked by a kind Frederick Brennan for my comments that I might offer on the discussion of removing riser fs v3 from the kernel. I don't post directly because. Because I am in prison for killing my wife Nina in 2006. I am very sorry for my crime. A proper apology would be off topic for this forum.
Adam Gordon Bell (1:55)
That's Hans Reiser. The voice is OpenAI. Everyone in today's episode but me is going to be computer generated. But yeah, the letter is a response to a prompt from Frederick Brennan. Riser FS is being deprecated from the kernel, meaning it's on its way out, it's obsolete, it's going to be removed. And Hans Reiser, the creator, well, he's in prison for murder. And actually the file system and the man and the murder, they're all really bound up together, they are linked. And so the letter goes on. He points people to Riser 4, which is a newer version than that in the kernel. He says it's a more maintainable basis for the future of the file system. And then he goes on and on and on, right? Thousands of words about the technical challenges, the interpersonal conflicts, the mistakes made, the dreams he had, the life that he lost. It's an unexpected thing. A letter from a convicted murderer on a technical mailing list that is usually just patches, a glimpse into the human story behind the code. It's a man trying to explain himself and to grapple with his past, and a man wondering if redemption is possible because he started on this whole path just with some simple improvements he wanted to make to file systems. There are lots of ways to tell the story of Riser FS and of Hans and of Nina, his wife and victim. There are literal whole books written about the subject. But my way of telling the story. Well, I wanted to tell you about how you can't separate the person from the work, the person from the code. You can't separate the technical from the social. You can't be a monster in one domain and not have it be part of the others. It's all mixed up together. And that is today's story. So picture this. It's the 1990s. The Internet is taking off Linux, this free open source operating system, gaining traction. Programmers are building all sorts of new things. Websites, application tools, and all of these things, especially on Linux, they're made of files, lots and lots of files. And how does Linux keep track of all these files? Well, with the file system, of course. And back then, the popular Linux file system was ext2 and it was like, okay, right? It worked, but it had problems if you didn't know under the covers. A file system is kind of like a librarian for your disk. It helps organize things. If I want to add a new book to my collection, the library has to find an empty shelf and put it there. But then the librarian also has to update a card catalog with details of where it put that book, or it'll never be able to find it when I ask for it in the future. This card catalog, this index of what's on the file system is very important. It's like your directory listing. Now imagine you're doing this and then suddenly, boom, the power flickers, the lights go out, there's a power failure when power comes back on. What's the state of your library? Right. Maybe the library found shelf space, but never got to put the book there. Maybe the book's on the shelf, but the librarian didn't get to the catalog. Maybe they were halfway through writing the card catalog entry and everything's in this weird half finished state, when your computer crashes, and they did all the time back then, you'd end up with a mess, right? Files marked as stored, but they're not actually there. Files that are stored but not properly recorded. It was chaos. So if your computer crashed, you had to run this thing. File system check fsck. And it could take hours because it's going through all this data and correcting it. It could take literally hours on a big disk, it could take all day. And disks were getting bigger. Then there were performance issues. Big directories would have lots of files in them and ext 2 it would slow to a crawl. It used linked lists to organize these directories. And if you've done your coding exercises, you know that going through a linked list node by node, it can take a lot of time. But yeah, it's 1993. The.com boom hasn't happened yet. There hasn't been a Netscape IPO. And Hans, he's in Oakland, California. He's across the bay from San Francisco in a cluttered home office filled with computer monitors and stacks of books and the hum of cooling fans. And he cares about open source. He cares about Linux a lot. He wants to build a better file system. Faster, more efficient, more elegant than anything out there. But building a file system is not really a one person job. It could take a team. And Hans didn't have a lot of money. He was bootstrapping this effort. He was working a day job and pouring every spare minute he had into creating a new file system. And then he had this idea. The Soviet Union had recently collapsed and he had read an article about how Russian programmers, incredibly talented programmers, were working for next to nothing after this collapse. And Han saw an opportunity and so he immediately booked a flight to Moscow. And Moscow in 1993 is an interesting place, right? With this collapse having happened in 91, everything's changing. And here's this American programmer, this guy with a cowboy hat, walking around into a world he doesn't really understand. He literally wore a cowboy hat in Moscow to play up his American ness. He's trying to build a team. He's trying to find these smart and cheap developers. He wants to communicate his vision and navigate this culture that's completely different to his own. He's this American Moscow and he's sticking out and he's not blending in, he's making a statement about his American ness. So he finds his programmers and he's doing this all on a shoestring budget. So he's paying these programmers A fraction of what they'd make in the us but for them, it's a significant raise. And Hans is working his butt off to keep the money coming in. He's working at Synopsys and then Sun Microsystems, he's taking on contract gigs. He's moonlighting at some research center in New Jersey, flying back and forth across the country in the us Flying back and forth between continents from US and Russia, just to fund this team, to keep his dream alive of this file system. And for a while, it seems like this is working. The team is making progress, the code is coming together. Riser FS is taking shape. But, you know, distributed remote work is hard back then, so he has to travel back and forth between the US and Russia. He has to check on the team. He has to make sure the code is to his standards, that the algorithms are efficient, he's pushing them hard, and he's demanding excellence because he knows that there's no room for second best in a file system. But then there's the cracks in the foundation, there's cultural differences, there's communication barriers, and just the challenges of managing a remote team. And these things, they start to wear on him. He's used to getting his way to being completely in control. And in Russia, he's having trouble with that control aspect, but he keeps it up for years. And Fast forward to March 1998. Hans is back in Russia, but now he's in Saint Petersburg. He's at a cafe next to a canal. And he's meeting a woman, Nina Sharanova. She's a mail order bride. And Hans is smitten with her. Her voice, her smile, her intelligence. She's a doctor, she's a ob gyn. She seems to be everything he's looking for, an intelligent, smart woman. So they get married. It's a quick courtship, a hastily arranged wedding, and soon Nina is pregnant with their first child, rory, who's born September 1999. And it's a happy new chapter for Hans. He's now got a wife and a kid, and he's got his team in Russia, and they're making great strides on their file system. File system has journaling, which is an old idea, you know, before the librarian shelves that book and then writes the card catalog, the librarian writes down in a journal what they're going to do. So if the power goes out, if something goes wrong, you can recover by looking at what's in the journal. Riser FS his file system. It also used B trees to organize directories. So no slow listing of files. But the biggest trick of Riser FS was that under certain conditions, it sort of created more disk space by doing things more efficiently. And that was a big deal. But also the price of Han's ambition was starting to become clear. Because if you rewind, if you go back to the late 80s, before Riserfs, before Namesis, this company he created to create it, there were warning signs. Little and sometimes big social glitches. They weren't about the technology, not exactly. They were more about Hans. Because when Hans was at UC Berkeley, he was part of the student run group called the Open Computing Facility, the ocf. And it's down in the basement of Evans hall with rows of humming computers with fluorescent lights buzzing overhead. And for Hans and many others, it's a haven. A place to code, to build, to create. A place dedicated to open source and open access. The OCF is volunteer Ron and Hans gets very involved. He even manages to secure a donation of workstations. But the OCF is not just about technology, right? It's a. It's a shared space, it's a community, it's about people working together, sharing ideas, sharing resources and building something bigger than themselves. It's open source, it thrives on collaboration. And Hans, he doesn't really get that. He's brilliant, yes, but he's also got this kind of intense personality. He's arrogant, he wants control and he doesn't play well with others. There are all these stories, like the time he booted an undergrad off the system for posting a message on Usenet that he disagreed with. Or the time he physically assaulted a colleague after some disagreement. Or the meeting minutes that have titles like Hans Complains and the Earth Shakes. These weren't just isolated incidents, these were a pattern. One former user put it this way, he acted as if he owned the Open Computing Facility and that everyone should kowtow to him. Another said he went out of his way to be mean and petty and arrogant and small minded. These are signs, right? Signs of a person who's not well integrated. Signs of a person who lacks emotional intelligence. Signs that are often rationalized away when someone is brilliant, when someone is talented. But yeah, Namesys and Hans, by the time they got to version 3 of riser fs, they were really onto something. The Linux kernel version 2.4.1 included riser as an option. And all of a sudden this code had distribution. And since it was the first Linux file system with journaling, it was a solid choice for people to use. But yeah, the thing that made it really exciting was a Namesis or HANS innovation called tail packing. I feel like I'm going to get a little tired of this librarian metaphor, but here you go. Imagine that our librarian's shelves are divided into blocks that are the size of a medium sized hardcover book. So we call that a block. And most file systems have four kilobyte blocks. That's how a hard drive works, right? Each little area is a block. We address them by some sort of index. The librarian in the card catalog is actually writing down the address of the blocks where the book is stored. And if the book is larger than a block size, then the librarian just splits up the book and puts it in as many blocks as it needs. Four kilobytes is actually pretty small. So many books are split up across many, many blocks. And fragmentation. If you remember running defrag on your Windows machine as a kid, like I do, fragmentation is when the books that are big and need to be split up are split all over the library, right? There's only so many open spaces. And so the books get all scattered in blocks all over the place. And defrag is the process of rearranging these, right? So things that you use together are next to each other so that you can read them sequentially and everything happens faster. But tail packing is different. It's maybe some intentional fragmentation. It's this technique for dealing with small files. Because when you have all these books that are a bit bigger than a block, you get these little tails, these little parts that are too big for that initial block. And instead of storing them with the rest of the book, you store them all together. You pack all these little ends of books together into one block. This effectively gave you more space, especially if you had lots of small files. Because imagine without tail packing, if you were storing pamphlets instead of books, you know, storing one pamphlet per block, you're leaving all this extra space, but you could just pack a whole bunch of those pamphlets into one block and then you save space. It was brilliant, right? Suddenly you had this extra space on your hard drive. If you had had a lot of small files, no file system checks because of the journaling, more space. It was a significant improvement on ext2 and the Linux community loved it. Companies issued praise.
