
What if the tools you trust were actually betraying you? Join us for a riveting story where a team of software developers discovers that their compiler is compromised. What starts as suspicion of a simple bug quickly escalates into the alarming...
Loading summary
Adam Gordon Bell
Welcome to Co Recursive. I'm Adam Gordon Bell and this is episode 100. 100 episodes. Knowing how long it takes me to make each one, getting to 100 is almost unbelievable. And today, something special I'm going to share with you a story. It's a fictional story about debugging code and I don't want to spoil it, but when I first read it it blew my mind and I knew I wanted to somehow include it on the podcast, but I was never quite sure how. Well, today I figured it out. That's what we're going to do. It's a story from Lawrence Kesselut, a story about a team of software developers who are forced to challenge their understanding of technology and themselves. After the frequent guests Don and Crystal are back with some hundreds episode reflections. But first join Patrick, Dave and I as we try to debug some code in this fantastic short story. After a minute, Patrick returned with an old metal case network switch. The room fell silent as he plugged it in. Our project and so much was hanging in the balance. I stopped breathing as Patrick struggled to get the plug lined up with the port. I stared at the front panel lights and felt Dave doing the same. Eyes watered. Patrick plugged it in. The front lights immediately lit up and flash actively. That was not good. Heat rushed into my hands and face and Dave, about to say something, suddenly lurched for the nearest garbage pail and vomited. Three weeks earlier Everything was different. We were past the honeymoon phase of our project. That's when you move from exciting ideas to the hard realities of making them work. The simple elements from our design phase were proving to be unexpectedly complex and not because they were interesting or challenging, but just due to minor unforeseen issues. Patrick and Dave had done the design, of course, and had humored me as I made various suggestions, which in retrospect must have been obvious to them. It had been fascinating to hear them debate design issues, so many of their arguments were based on intuition rather than hard logic. I learn more each week than my entire last semester of school before I started here and things got really chaotic. I was mainly comfortable with Python and JavaScript. I knew them inside and out from computer science classes, but in this new job I was thrown into the world of C and low level programming and honestly it was a bit intimidating at first, but as I kept at it, things started to click and I started to understand the debates. I found myself siding more with Dave. Dave had a great laugh. He was a bit on the heavy side, but he dressed very well. I liked the insights he had about management and the process of software development. And he liked to teach me things. Patrick and Dave had split up the work they had designed, and they had given me tests to write. It had been a quiet day when I heard Dave whisper, what the hell? I sensed an opportunity to learn, so I walked over to his desk. He didn't notice he was quickly switching back to his editor, adding a line of debug output, compiling, running, shaking his head, and then switching back again. What's the matter? He waited until the program finished running to answer.
Dave
I can't figure out this bug. I can't figure out where this number is coming from. Here.
Adam Gordon Bell
He pointed to a line in his debug output. Dave and Patrick like to debug with print lines rather than stepping through in a debugger, as I like to do. What happens when you step through in a debugger? This was a tease. He knew a lot more than I did, but I was always poking him about his old fashioned workflows. Dave hesitated. Then, to my surprise, he opened up the debugger. I showed him how to set breakpoints, and together we dove in to the tangle of his code. For the next few hours we traced this mysterious value back through his code base.
Dave
Patrick, can you help us here?
Adam Gordon Bell
This surprised me more than the debugger. Dave rarely asked Patrick for help. Patrick came over and Dave explained the situation.
Patrick
The program is crashing and that's because of this bad dereference here. But the value is correct here.
Adam Gordon Bell
Dave pointed at some code while Patrick scanned the screen. Then a thought occurred to me. Dave was still explaining and I didn't want to interrupt. But my heart was jumping as I waited for Dave to take a breath. I think it's a compiler bug.
Patrick
Blaming the compiler is a last resort. Same with the standard library. Chances are vastly greater that your brand new code has a bug, rather than code being used by thousands of people.
Adam Gordon Bell
I nodded wisely, but I felt my cheeks blushed. Patrick had a way of making me feel inexperienced constantly, though I doubt it was intentional. Once, when I proudly solved the minor bug, his response was a simple nod, and honestly, that felt good. His direct style often left me feeling eager to prove myself to him. Patrick was cracking his knuckles. He asked a question about threads and volatile and aliasing, questions that I didn't entirely understand, but I wish I thought of them. Dave considered each and said that none of Patrick's concerns applied. Patrick pursed his lips to the right, the way he always did when he wasn't quite sure what he was going.
Patrick
To say, hmm, I don't know. Weird.
Adam Gordon Bell
And then Patrick returned to his desk. I was relieved that Patrick hadn't crushed us with an obvious solution. And I guess that Dave felt the same as he sat back up from his slouch and started typing at his machine.
Dave
Is there any way to show assembly right here where things go pear shaped?
Adam Gordon Bell
I showed him how to do this and wondered if he was giving my theory a chance. He typed the commands and the lines of C were split apart by dozens of lines of assembly.
Dave
That's not right. We got the command wrong.
Adam Gordon Bell
I'm sure that's right. I know the debugger well.
Dave
No, it's not. That's far too much assembly for this line of code. And some of these aren't even legal assembly instructions.
Patrick
That's not possible.
Adam Gordon Bell
Patrick was coming back over from his desk and Dave was blinking slowly at his screen.
Dave
Well, I mean that. I've never seen these. They're not what a compiler outputs. This isn't the right place.
Adam Gordon Bell
Dave was flustered, and I felt even more inexperienced than before we started.
Dave
I'm. I'm burned out. We'll look at this again tomorrow.
Adam Gordon Bell
He left and I sat down at the computer. The only thing I had contributed all day was my knowledge of the debugger, and Dave even thought I got that wrong. I looked at the assembly on the screen. Some of the instructions were obvious and some were cryptic. I googled around for an assembly reference and started tracing the instructions, keeping notes about register contents in my notebook as I went. Dave was right. These instructions made no sense. Not only did they perform too much work for the corresponding C code, but they didn't even make sense internally to themselves. But I convinced myself that this was at least the right section because a few of the instructions matched the surrounding lines of C code. I focused on one instruction in particular. It was subtracting a register from itself. That's not necessarily odd. It might be an efficient way to set a register to zero. But then this register was used in other instructions. The compiler must have known it had been zero. It should optimize it out. I hesitated, and then I called Patrick over. I explained what I found, and he stared at it for a good while.
Patrick
That's not a normal subtraction instruction.
Adam Gordon Bell
He was right. I looked it up more carefully and found that it was a variation that also subtracted out the carry bit. This was a way to get the carry bit into a register. I worked my way backwards to see where the carry was set. The code got increasingly convoluted and I repeatedly made incorrect assumptions that threw me off course. I turned around to ask Patrick a question and found an empty desk. Somehow I'd lost track of time and it was nearly midnight. I set the office alarm and rode my bike home. I got in late the next morning. I hadn't slept well and it had taken me hours to fall asleep. Each time I closed my eyes, I saw assembly instructions and large bright letters at work. I walked straight to Dave's desk to discuss the previous night's investigations, but he wasn't interested.
Dave
You were right. It was a compiler bug. I tweaked the C code and I'm no longer triggering it. That weird assembly is gone.
Adam Gordon Bell
I was in the awkward position of having to contradict his compliment. I don't know if it's a bug. The code I saw wouldn't have been generated by a bug.
Dave
It was definitely a compiler bug.
Adam Gordon Bell
I had worked with Dave enough to know that the more confidently he spoke, the less secure he was. He was tired of being held up by this problem and so he just wanted to drop it. But later, Dave walked up to my desk with a grin and a cup of his favorite artisanal coffee. I thought all coffees tasted fine, but he was super picky, so I always pretended to be picky as well so he would invite me along to his artisanal coffee place. I actually felt a bit stung that he had gone without me.
Dave
You remember that compiler bug from Tuesday?
Adam Gordon Bell
Yeah.
Dave
Happened again this morning. Same file, same problem. That weird assembly is back. The strange thing is, I didn't change the C file.
Adam Gordon Bell
Maybe you changed the header file.
Dave
Oh, maybe. Actually, no. Let me check. Git.
Adam Gordon Bell
He came back two minutes later. No relevant files had changed. Patrick walked over.
Patrick
This is no good here. It's a crash, but it could be something more subtle. In four months, our system will be so complex that a subtle problem will take two weeks to track down. Can you look at the compiler's release notes and see if we can upgrade?
Adam Gordon Bell
I looked briefly and found what I expected. We were on the latest and most stable version. I went further and searched for some permutation of weird assembly or compiler bug and found nothing in the release notes. I compiled Dave's code on my computer and then disassembled it. The strange code was there at the same place it'd been on Tuesday. I recognized the subtract with carry instructions and a few others from my previous investigation. I also looked through the rest of the code. It was striking how the unusual instruction didn't appear anywhere Else? I had an idea. I thought maybe I could figure this out by looking at the source of the compiler. I downloaded it and started poking through it. It was a tangled mess of compiler passes and plugin frameworks and embedded languages. I had never been so overwhelmed. I went straight to the files that described the translation from the abstract syntax tree to machine assembly. I did a grep for that subtract with Carrie and it wasn't there. I looked for a few of the other odd instructions. Most of them weren't there either. Pretty soon it was lunchtime. The small office we used had a patio, and Friday we doordashed some deli sandwiches. I preferred the nearby subway, but Dave hated the bread. While eating, I mentioned what I found.
Patrick
That's super weird. Which other ones weren't in the translation file?
Adam Gordon Bell
I don't remember. A few more that involved the carry bit. Some vector instructions.
Patrick
But that compiler doesn't do vectorization. Either the instruction was generated by mistake or you're disassembling non code.
Dave
It's definitely code. This is what's causing our bug.
Adam Gordon Bell
It's being executed and it's not generated by mistake. It's a math instruction, but I don't think it's being used for math.
Patrick
Nate.
Adam Gordon Bell
We had finally hooked Patrick. After lunch, he and I sat down at my desk and walked through the parts of the code I knew best. I showed him how, though. Although the instructions were obscure and strangely used, they actually made sense altogether. There was a deliberate flow of data.
Patrick
Look for a backward jump instruction.
Dave
Why?
Patrick
The target of such a jump may be the top of a loop. It's a good place to start the analysis.
Adam Gordon Bell
You're actually going to figure out what this code is doing?
Patrick
Damn right I am.
Adam Gordon Bell
It took the rest of the afternoon to pick through the convoluted jump targets and instructions. That snippet, it turns out, was finding the sign of an integer. That's it. Anyone else would have done a simple comparison, but the four instructions used were a mess that either set the carry bit as a side effect or used it in an unorthodox way.
Patrick
You know, this isn't even the interesting part. I want to know how this code gets in here. You said these instructions weren't even in the translation file.
Adam Gordon Bell
That's right.
Patrick
They must be elsewhere. Let's grep the source for both the symbolic version and the opcode.
Adam Gordon Bell
I did a recursive grep and it came up empty. I didn't know what to try next.
Patrick
Try the binary.
Dave
What binary?
Patrick
The binary of the compiler.
Adam Gordon Bell
I would have never thought of that. Actually, before I started this job a year or so ago, I never thought much about assembly or what compilers did. I certainly would have never downloaded the source of a compiler or inspected the internals of a binary. That was the cool thing about Dave and Patrick. They weren't afraid to dive into the details. And I was learning a lot. They sometimes seemed almost compelled to understand what was happening underneath everything. And that approach was rubbing off on me. I grabbed for the name of the assembly instructions and found nothing. But looking for the opcode, I turned up hundreds of hits. Interesting. It's not generating individual instructions, it's dumping chunks of prebuilt code.
Patrick
Why do you say that?
Adam Gordon Bell
Because these opcodes are all together. It's not a lookup table of C to assembly.
Patrick
Maybe we need to take apart the assembly.
Adam Gordon Bell
This would mean disassembling the binary. They had had me do this before. I had to reverse engineer a vendor's code and patch some problems that they weren't fixing. So I disassembled the whole compiler binary and I looked for suspicious instructions. And I did. I found some large sections that looked like the obscure code that we had been trying to trace down. The compiler's infected. No wonder we couldn't find it in its source code.
Patrick
Okay, let's start by recompiling the compiler from source. I'll poke around the web to see if anyone's seen this before. When you're done with the compiler, recompile all your own code and see if Dave's bug is gone.
Adam Gordon Bell
The compiler took two hours to recompile, not including the time I spent learning the convoluted build process. Meanwhile, Patrick found nothing online. I rebuilt our source tree and ran the tests. They failed in the same place.
Dave
Maybe the bug was due to something else after all.
Patrick
Disassemble the compiler again.
Adam Gordon Bell
I did, and the foreign codes were still there. This is a fresh build of the official sources.
Dave
They must have gotten infected. Perhaps someone hacked the download site and replaced them with modified sources.
Adam Gordon Bell
Yeah, but I never found these opcodes in the compiler source.
Dave
But if it's a hack, it would try to disguise itself.
Adam Gordon Bell
Dave had a point, but I thought I could track down where this code was coming from. I found where the compiler emitted the instructions and I set a conditional breakpoint. That breakpoint should only hit when the obscure opcode is emitted. The following Monday, I started compiling Dave's code using my compiled from source compiler with the debugger in place, and I got a hit. I worked backwards to the code that had filled it and it was all straightforward loops based on the translation table. It was all clean. This didn't make any sense. In desperation, I started paging through every source file of the compiler, looking for any code that might be responsible. Much of it was just manipulating the syntax tree. Then it occurred to me the hack couldn't be there. The backend translation code was clean and it had to get through that. The hack must be in the backend itself. In fact, it would have to be after register assignment. That narrowed down my search and I spent the afternoon looking through these files before it was time for lunch again. Monday was always Pho day. We actually went to Pho World, which was so low rent it didn't even have a menu. But it was delicious. Dave had been the one who found this place and we all followed his lead on what to order and then sat around a small plastic table with our soups. So I couldn't find it either backwards from the breakpoint or forward from the code.
Dave
Are you still debugging that? Don't we have more important work to do?
Patrick
No, we have to figure this out. We can't build a product on a shaky foundation like this.
Adam Gordon Bell
It's just so odd that the C source would be clean, but the assembly have these weird opcodes that would break our project.
Dave
I thought you had tracked it down to the compiler.
Adam Gordon Bell
I'm talking about the compiler. I mean, the bug is emitted by the compiled compiler, but it's not in the compiler source.
Dave
But the compiler is responsible for that as well.
Adam Gordon Bell
What do you mean?
Dave
Well, how did you compile the compiler? It's compiled by a version of itself.
Adam Gordon Bell
I didn't understand what Dave was saying, but Patrick had looked up and seemed on the verge of an insight. I waited for his explanation.
Patrick
So the compiler detects and modifies your program for reasons still unknown, but also detects and modifies itself when compiled?
Dave
Exactly.
Patrick
How would that work? Hacker adds this code to the compiler and distributes it to everyone. The code detects that it's compiling the compiler and adds itself back into the binary. One revision later, the hacker removes the code from the official source. The hack then perpetuates itself forever with no trace in the source.
Adam Gordon Bell
I was sorted on eating my soup at this point, and I was also a bit confused. Why? Like, what's the purpose of that? Why even do that?
Patrick
I don't know. We didn't get far enough into the analysis of Dave's code. The obvious thing would be some kind of password validation code modified to always accept some backdoor password.
Dave
So let's go back to an old version of the compiler and use that.
Patrick
You mean the binary of an old version? The source won't help us because compiling it would infect it. I don't think we have binaries around for old versions of the C compiler and we don't know how far back this goes.
Adam Gordon Bell
Okay, well how about this? I'll write a utility that you can run on a binary and it will tell you if it detects the suspicious use of the opcode. That opcode isn't used very much. We all agreed this was a good plan, although it did feel like we were missing something. Something simple and dumb. But it didn't take me long to write this utility. I just ran the binary through the disassembler, then did a few greps to find the instructions that I was looking for. And then I tested it out. It found the code in Dave's program as well as in the compiler. I set it loose on the executables on my system and gave it time to run. Then I printed out what it found and showed it to Patrick. It was three pages long in pretty small font.
Patrick
This is not good. The Java runtime, the Python runtime, Chrome, the compiler, and a bunch of other programs that probably don't matter.
Adam Gordon Bell
Wait, why do those runtimes matter?
Patrick
Because if we can't trust the C compiler, then we have to write a new one. But what language are you going to write it in? And are you going to trust your new compiler to the hacked Python interpreter?
Dave
You're not going to write a new compiler. You two have dove off the deep end. Don't get all conspiracy paranoid. It's probably just a compiler bug.
Adam Gordon Bell
I left Patrick staring at the list and went back to my desk. I still didn't have the answer to the question I'd asked at lunch. What was the purpose of this? I had enjoyed reverse engineering some of these bits of code, but frankly, I feared that we'd gone off track and that Patrick was going to ask me to write a compiler in assembly. So I put on my headphones and I opened up the debugger to the part of the C compiler that looked obfuscated. Maybe I could figure this out again. I found the overuse of instructions that involved the carry bit, unusual use of vector instructions, and convoluted and sometimes unnecessary jumps. It didn't seem like a compiler would generate this. This was hand crafted to be difficult to understand. I set out to figure its purpose. Sometime later, I saw the custodian Coming by to pick up my trash, I took off my headphones. Dave and Patrick were long gone. It was 10 o'clock, but I had pieced together a rough idea of what this did, or at least some parts of it, and I felt a rush of energy. I was close to an answer. The next morning, Patrick sat next to me and I explained it to him. Here. They used the vector instructions to get the sum of squares, which is just a convoluted way to compare these two byte arrays. This was the climax of my 10 minute explanation.
Patrick
Wait, so what are they doing?
Adam Gordon Bell
They're doing pattern matching.
Patrick
Why all the convoluted crap?
Adam Gordon Bell
Well, it's a fuzzy search and I guess it's pretty fast.
Patrick
Yeah, but this is the most convoluted Rube Goldberg way of doing something I've ever seen.
Adam Gordon Bell
My shoulders sunk and I fiddled with my mouse. I was disappointed. I thought I could impress Patrick. And I had worked so hard on this. And honestly, part of it was just about finally impressing him. So what now?
Patrick
I don't know. Let me catch up on email.
Adam Gordon Bell
He left for his desk. I was deflated and my muscles ached. I spent the rest of the morning browsing Twitter and reading rumors about a new Apple product. At lunch, Patrick recounted my findings to Dave, and it seemed like he remembered every detail. I told him about what the code did, and Dave grinned and shook his head with every new complication to the algorithm. Maybe Patrick was listening. And I realized, listening to it all, that Patrick had been right. It was too convoluted of a way to do something this relatively simple.
Dave
Have you ever seen those obfuscated programming contests? This is just like that.
Patrick
A friend and I used to compete with each other to write obfuscated programs in college. This is nothing like what I saw this morning.
Dave
Well, maybe other people do it differently.
Patrick
I feel strange about this code. I don't know how to explain it. It just feels cold and odd.
Adam Gordon Bell
Dave and I looked at Patrick. He was dissolving wasabi into his soy sauce. I didn't interrupt his thought gathering.
Patrick
It's like those chess programs. So they don't have intuition about what will work or feelings about the board position. They just try every possible option and pick the best one. And this kind of feels like that. Like someone tried every possible combination of instructions until they got code that did what they wanted it to do. So there's no beauty. The code is just ugly.
Dave
Why would somebody do that?
Patrick
Yeah, maybe no one did. Maybe this is all computers doing this.
Dave
This seems like it's a little bit too complicated. Like are we. Are we sinking down a rabbit hole here?
Patrick
Well, what's your explanation?
Dave
Not self aware, artificially intelligent robot overlords infecting my object marshaling code. I mean, that's not a plausible explanation.
Patrick
Well, what's your explanation?
Adam Gordon Bell
Dave exhaled and took the question seriously. It was hard to refute some of Patrick's point. No other explanation made sense. We had never seen a compiler generate this type of code. A human would have a hard time writing this, just understanding all the jumps and the self referential code, let alone the needlessly arcane instructions and using them in strange ways. It was a lot to throw in the mix. And if anything, using those obscure instructions just drew people's attention to the code. In fact, it was the only way I was able to track it down.
Dave
Okay, so what's your explanation? Your full explanation?
Patrick
I don't know. Maybe some artificial intelligence program that ran amok. Or you know how computer viruses evade virus scanners by modifying their own code. Maybe it started out that way with a virus that was programmed to modify itself while retaining the same behavior. And it kept changing and evolving.
Adam Gordon Bell
We were all silent for a few minutes. I was trying to see if this explanation made sense. What Patrick was proposing was something like a worm that propagated through compilers. It was like some lines of assembly that injected itself into a compiler the same way a virus injects itself and takes over a cell and then that compiler repeats the process and somehow Dave's code had triggered it. We had found part of this thing's reproduction logic. We found its pattern matching just by dumb luck. At first it seemed unlikely, and I wasn't sure if Patrick and Dave were thinking about this the same way I was, but it would actually only take a single instance of a program somewhere going in this direction. And then, just like the virus, just like the common cold, it would grow and it would propagate in the wild. Once it was in compilers and runtimes on machines like ours, it would just spread and spread and spread. Well wait, this didn't start in our lab. We just got pre built binaries from the official distribution. Someone must have run into this before. I can't believe that the official compiler is inserting half broken code into binaries and we were the first to notice it.
Dave
Exactly. So we can't just have uncovered something this big that is so unlikely to be us. I'll post something after lunch and someone will have seen this or know the cause. I'm certain.
Adam Gordon Bell
An hour later I got a message from Dave with several links to places he had posted our findings and asked if anyone had ever seen anything like this before. One of them was a post on Asshn where he was begging people for help so that his co worker didn't force him to write a compiler all in assembly. Honestly, that made me chuckle. I spent the afternoon poking through more mysterious code and occasionally refreshing Dave's posts online. On a few, like hacker news, we were completely ignored, but on most, we were vaguely ridiculed. I biked home exhausted and fell asleep on my couch. The next morning I found Patrick looking at my printout of infected programs. I walked over to his desk and stood by him. What are you looking for?
Patrick
A way for us to write a compiler in something other than assembly. The assembler is not infected, but so much else is.
Dave
What about the browser? Can we Write it in JavaScript?
Patrick
Yeah, it's infected.
Dave
This is crazy. We can't be writing a compiler in assembly. No way. We must be missing something here.
Patrick
It's not that awful, really. We'll spend the day writing useful low level routines, and after that, assembly is not much more painful than C. It only needs to be able to compile a single program, the existing C compiler, and then we've cut the chain of infection.
Adam Gordon Bell
The C compiler took two hours just to compile. It's pretty complex.
Dave
Do we have to write a linker as well?
Patrick
Nope, that is safe.
Dave
Wait, let's back up. Last week I was able to fix my bug by modifying the code enough to avoid triggering the problem.
Patrick
But then it came back without you changing the code.
Dave
Yeah, I know, but before we write this thing, let's at least try to modify the compiler's own code. Maybe it will work the same. A small change somewhere and the bug hack, whatever it is, won't trigger and we'll have a clean compiler.
Patrick
You could, I guess. But where would you make the modification? Remember that we think this was introduced several revisions ago? That means the pattern recognition is pretty solid, the compiler source is changing and this keeps being added back in. And each test will take you two hours.
Dave
Well, I can compile it in the background while we start to write this thing.
Adam Gordon Bell
So it went. Dave downloaded the compiler source and compiled it and found the obfuscated opcodes. He mapped that back to the original source, changed the source a bit, compiled it, and so on. I could sense him losing hope as he realized that the hacked code, when you traced it back, was spread out throughout the program. It wasn't just isolated to one spot. Meanwhile, I got the sense that Patrick was just itching to build a compiler in assembly. He wrote some basic string manipulation routines and generated a binary, and then he had me test it with my program and it was clean. He had found a way to cut the infection. I brushed up on my assembly. I had only ever written assembly once. At school. We had some operating system class where we had to use it. It was hard, but there was something pure and raw about manipulating registers directly and about knowing exactly what was going on. By the afternoon, Patrick was ready to give me an assignment. I was going to write the C preprocessor. Patrick, meanwhile, had started on a simple recursive descent parser. We were building the world from scratch. It honestly seemed like an insane plan, and we continued like this for days. Patrick would hand us some simple assignments. I would do it, or Dave would do it while waiting for the compile to finish. Working on his own plan. He was still playing whack a mole with various segments of code, trying to trick it into generating a benign version of the compiler. But each success would cause a regression elsewhere. Two weeks later, Patrick's plan was winning out. Our assembly compiler was able to get through a pretty large fraction of the original compiler code, and then it was able to get through it all. I ran my analysis program on the result and it was clean. We had a clean version of the compiler. It was two in the afternoon and we had forgotten to eat lunch and our sprint to the finish line.
Patrick
Let's start a rebuild of this code and go eat at lunch.
Adam Gordon Bell
My mind was too wired to relax, but really I was too tired to make conversation. Dave was talking about local politics and I didn't really care. I just wanted the food to arrive so that I'd have an excuse to stay silent. All that was on my mind was this project we'd been working on. And then finally Dave brought it up.
Dave
You know what bugs me? Well, we've never come close to figuring out the purpose of those modifications.
Patrick
If I'm right, and this is machine instigated, then there doesn't have to be a purpose.
Dave
Why would they do it then?
Patrick
Viruses don't spread because they have a purpose. They spread because they're good at spreading.
Dave
So you think it was a virus?
Patrick
Well, it is, in the sense that it spreads. It puts stuff into our code, but.
Dave
It doesn't put itself into our code. That wouldn't make sense. I wasn't writing a compiler. This was just some network code.
Patrick
Well, if you're going to spread, then network code would be a good Routine to infect.
Adam Gordon Bell
I felt a bit ashamed. How had we not thought of this before? We were more than two weeks in and we were just focused on getting our project back on track. And we hadn't taken time to understand this alien bug that had infested our system. What was it trying to do when.
Dave
My network code wasn't talking to a compiler? I mean, this is a compiler bug. A compiler virus. I don't understand what you think it's doing.
Patrick
I don't know what it's doing. I wonder if it's sending stuff over the network.
Dave
We could check that with wireshark.
Adam Gordon Bell
Suddenly all I wanted to do was go back to the office and try it. The sandwiches arrived and we just wrapped them up and immediately drove back. I had wireshark already on my computer, so we all went to my desk. I ran the program and recorded a few minutes of network activity.
Dave
That's a lot of stuff.
Patrick
Yeah, let's just pick one.
Adam Gordon Bell
I looked through the list and visually picked one that seemed to recur. We found it was ssh and I remembered I had a shell window open. I closed that and recorded another minute. This time we had fewer packets. We picked through them one by one time. Synchronization, Gmail, refreshing various programs, checking for updates. We added each to wireshark's filter once we had convinced ourselves they were innocuous. Then I did a 10 minute capture. There were a few more packets, again all innocuous. It was what we had expected. Of course. Dave mumbled something and he walked off to the kitchen. But then I had a thought and I felt chills run down my arm while I frantically searched for the papers on my desk, looking for the printout. It was at Patrick's desk. That's right. I started looking through the list of infected programs. My eyes zipped down the list, cursing myself for not sorting it. And then my stomach began to squeeze tight as I found it halfway down the second page. Wireshark. Patrick guessed what I was looking for and he read the reaction on my face.
Patrick
I guess we can't trust it.
Dave
Trust what?
Patrick
Wireshark. It's infected.
Adam Gordon Bell
Dave rolled his eyes. He sat down at his desk and unwrapped his sandwich. On the back of my computer, where the Ethernet cable plugged in, a light was flickering every few seconds. Let's look at the lights on the Ethernet connection.
Patrick
Shut down all the programs we found earlier.
Adam Gordon Bell
I closed the browser, the chat programs and the various processes and tools. I couldn't shut down everything. There was always something left on the operating system. But the flashes were pretty infrequent. I could correlate them with the packets found in wireshark. This was good. Wireshark couldn't be hiding packets. We'd have seen the light flicker and they seemed to correlate one to one. I looked at Dave and he was smiling a little bit smug, right? He wasn't totally bought in on this plan. And that and that bothered me. If Wireshark was infected, then anything could be right? Like no useful program nowadays doesn't communicate over the network. I think we had to reject this idea that a virus crafted so carefully and strangely would just restrain its activities to my local machine. How could something be in hundreds of programs on my machine, that giant three page list, but not be using the network? Patrick suddenly stood up. He must have been thinking the same that I had. He kicked his chair to the floor. He walked into the hallway and he came back 10 minutes later with a piece of equipment the size of a small suitcase. He approached me with it. He shoved everything off my desk to make space. It was a digital scope. He had gotten it from the hardware engineers upstairs. He reached into his pocket, pulled out a breakout cable with RJ45 plugs. He plugged it into the back of my machine. He didn't actually know how to use the scope, so I brought up a shell window and generated lots of traffic. Eventually, he was able to clearly see all the packets. I closed the window and we waited, shifting our eyes between the scope and the Ethernet light. It was only a few seconds until the scope flickered with activity. I had been staring at the Ethernet light and I couldn't be sure that the light had shown anything. So I brought up wireshark to have a history of packets.
Patrick
Never mind that. You look at the light and call out each packet you see. I'll do that with a scope.
Adam Gordon Bell
Dave got up and walked casually to us, standing behind me. Now?
Patrick
Yes.
Adam Gordon Bell
Now?
Patrick
Yes.
Adam Gordon Bell
This happened several times. Then Patrick said now. And I saw nothing. Then this happened again and I started to feel goosebumps on my arms. Whoa.
Patrick
Look at this.
Adam Gordon Bell
The scope was showing a long stretch of activity. I looked back at the Ethernet light. They were dark.
Dave
So you're not going to tell me that the Ethernet driver is infected?
Patrick
Yes, I am. It is totally hiding packets.
Adam Gordon Bell
I looked up at Dave. His face was pale. His eyes darted between the two pieces of equipment. I was paralyzed. Then Patrick stood erect in his chair, staring at the wall. This trance lasted only two seconds before he stood up and ran into the hallway. You already know what happened next. He came back with the old switch from years earlier and plugged it in. This was proof. Its lights flickered in sync with the scope. It saw the packets that were censored by wireshark and the packets that were censored by the Ethernet activity light. Then goodbye lunch. Dave left for the bathroom and Patrick and I cleaned up what we could of the mess. The smell amped up our panic and fear. My hands were starting to shake. We went outside and sat in silence at our patio table, our sandwiches half eaten in front of us. Honestly, none of the things I wanted to say seemed worth saying in my internal dialogue. I went back and forth from convincing myself we were mistaken to convincing myself that we were doomed. I wish, desperately, Patrick would say something at all. He was the most experienced out of us. Then the patio gate opened and the mailman walked in. He stopped by our table and he picked out a letter from his bundle and put it on the table and continued in the building.
Dave
It's addressed to both of us.
Adam Gordon Bell
He was talking to Patrick. He ripped open the side and pulled out a letter handwritten on loose leaf paper. He fumbled with the sheets for a few seconds and then read the letter aloud.
Dave
We found your posts online. For three years we've been waiting for them, scouring the Internet and monitoring forums. You must know that this has happened before, several times. The first was four years ago to our team in Virginia. We found our binaries modified. We could recompile the code to clean them, but we found the binaries mysteriously modified again only a few hours later. The next case was only a few months later to an unrelated team in San Diego. Both the binaries and the source had been modified. It was another year until we found the third case, a team in Spain. The binary was dirty. The source was clean. But recompiling did not fix the problem. The compiler's source code had been modified to insert the strange opcodes. Each team uncovered the worm's weakest point and developed a solution. This weakest point was then patched for the next attack. Each generation pushed the worm deeper into the system. Now it's your turn. Not only has the compiler been modified, but its source is clean. It infects itself on recompilation. Our own machines have been infected for nine months this way. So has the rest of the worlds. You may wonder, then, why we were eagerly awaiting for your post. To explain this, we must make two observations. The first is that anyone could have guessed these weaknesses. It takes a fool to modify a development team's binary, expecting it not to Be recompiled. Anyone would have skipped that step. There's no need to learn that lesson. Modifying their source is similarly naive. Yet these modifications are technically very sophisticated. Who would be so technically advanced, but so socially naive? Machines. It wasn't until the third attack that we came up with this hypothesis. And now we're convinced the opcodes were clearly generated by trial and error, by generating a random sequence and testing it to see if it behaved correctly. Only a machine would do this. The second observation is that all these years the worm has been widely spread but innocuous to infected programs. Yet it is not innocuous to these four teams. They were unable to finish their projects. They tried simple workarounds, but these workarounds persistently failed. Only a single team worldwide was affected by each generation of the worm. The machines must have known that their worm had weaknesses, but they didn't know what the weaknesses were. They forced a small team to be affected by the worm until that team found the weakest point and circumvented the worm. The machines then patched the weakness and tried again. This is a large scale version of what they do when they're generating opcodes. They try different things until one works instead of planning it out as a human would. We expect a few years to pass before we see the next team post to the forums? We can already predict their findings. The compiler's binaries will be clean or rewriting the compiler and assembly will work. The worm will have been pushed deeper, perhaps into the text editor, the assembler, the linker, the file system, the hard disk interface, or maybe the CPU itself.
Adam Gordon Bell
Dave couldn't finish reading. I don't think there was much left of the letter anyways. He put it down on his lap and we were silent for what seemed like hours. Eventually Patrick left and then Dave left and the November night came in early and cold. But I couldn't move. I went over our adventure again, scrutinizing every decision, questioning our assumptions. The biggest jump seemed like blaming the machines. Carl Sagan. His words echoed in my mind. Extraordinary claims require extraordinary evidence, and we didn't have it. In fact, we had nothing. We just had a gap where the explanation should be. Could humans be behind this? People create computer viruses all the time. Maybe we've just stumbled upon a regular virus and we've blown it out of proportion. Who's to say the Virginia team isn't overacting? That seems much more likely than machines orchestrating this. I felt the weight lift as I considered this. These virus writers, they may well have written A program to generate opcodes randomly. They may have started simply and over the years made their attack more and more sophisticated. Perhaps the authors were autistic genius savants who just think so differently from me and Dave and Patrick that it seems alien to us. I let that thought linger, but then I started thinking again. This new human virus theory seemed even less likely than the machine one. We had no idea what machines could do. But I could be pretty sure that no human would approach writing a worm like this. If you see someone playing chess by trying out every possible move, brute force, you have a safe bet to assume they're a computer and not a human. That's how this felt. But in the end, the details didn't matter. We had to warn everybody about what we found. We needed to act. We needed to act fast before the attackers could mess with the core parts of our computers. If this worm dug deeper, we'd be in real trouble. But then I imagined posting it online again and getting heckled by people. Who should I tell? The government? The Virginia team must have tried that. Why didn't antivirus programs detect this? I imagined trying to talk to officials and how they would laugh at me. I imagined screaming back at them and them thinking I'm crazy. I suddenly stood up, clenching my fists and pacing back and forth in the patio area. As a programmer, I always talked about computers in my mind as if it was a human. Yet when faced with action that seemed to actually come from a machine, it just seemed so alien. It was like a void. I picked up my sandwich wrapper and went inside. I dumped the sandwich in the trash and then opened the fridge absentmindedly and stared at the drinks. Then a happier thought occurred to me. Nothing we had seen was malicious other than occasionally bothering a team like ours. There was no evidence of malice here. Patrick had been right when he said a virus doesn't spread because it has a purpose. It just spreads because it's good at spreading. This worm might be a permanent tagalong. It might be like mitochondria. A symbiotic bond with us and our compilers. Trying to eradicate it might just provoke a hostile strain. Maybe we just let it be. It's an unsettling thought, but it is workable. I shut the fridge, I put on my jacket, and I walked over to the alarm panel. The LCD on our panel was a computer, too, right? Was it infected? Did it know of our new compiler? Was it going to let me arm it? Were the magnetic doors locking? Gonna let me out of the building? I was starting to spin. I entered the arming coat and the countdown began. I stepped out onto the patio and approached my bike. The computer had let me leave. I looked at my bike and I smiled. It was simple, no computers in it. But then I thought of the traffic lights I would have to drive through to get home. I thought of my credit card. I thought of the cars and telephones. I tucked my rape pant leg into my sock and unlock my bike. And then I had another thought. Maybe it's not all doom and gloom. Maybe we are a transitional species. Nearly all species eventually get replaced by something else. In the long run, we would be as well. We remember the dinosaurs. We worship the dinosaurs. We put them on display in our museums. We make movies of them. I hope the machines will remember us too. All right, that was the story. Thank you, Lawrence Kesselut. You can find a link to his blog and more about him in the show notes. What did you guys think of the story?
Dave
He's going for a more optimistic look, but like, foreign actors do this stuff all the time, right? Like, we live in a world where, like, these types of attacks are becoming more and more sophisticated. And like, having, you know, a worm work its way into compilers is like a credible threat. It's like, it wasn't malicious. It's like, well, what was it doing? Who is it talking to? It could have been made by somebody, right?
Adam Gordon Bell
Yeah.
Patrick
Yeah. When I saw Virginia, I thought it was. At first, I thought it was like the NSA or something.
Adam Gordon Bell
Exactly. So Ken Thompson, creator of C and unix, when he got the Turing Award, he gave this talk called On Trusting Trust. And so in his acceptance for getting the Turing Award, he said, hey, it's possible that if I had put something into that very first C compiler, basically when it compiled Unix, it put in a back door so that I could log into any UNIX machine. That's totally possible. And then he's like, and I could take it a step further and make it so when it compiles the C compiler, which it is itself, it would put that in as well. So then it wouldn't be in the source. Right. It would only be in the opcodes. And he's like, in that way, since I built the first compiler, that means all of them since have been compiled by some version of that original program I made. That could still be in there. That was like a speech you gave. Like, that's like a mic drop. Like, oh yeah, I might have infected the world with some secret thing.
Patrick
By the way, what is his redemptive arc like? I'm a good person, so I didn't actually do that.
Adam Gordon Bell
He's talked about it. He was working on some project when he was doing his PhD or maybe when he was at Bell Labs and somebody in the military brought up this as a weakness when you compiled programs that you could introduce something like that. And so he did try to create something like this. Basically he made a version of the compiler that would reintroduce itself, but like, somebody ran into a bug and then admitted some nonsense and they were like, what's going on? He's like, oh, you found my thing. So, like, he did try this.
Patrick
A troll. Neat.
Adam Gordon Bell
But he only made it a month or two. And I think recently he released the original code, but. So it failed. Right? Like, but the thing that's being suggested here and put on AIs is like a real thing. It's not just what compiles your program. Right. And looking at the source of your program, what compiled that. And like, how far back can you go?
Patrick
I think it's interesting about the whole. The implication of software being built in this kind of trusted community. I don't know, we kind of assume that people are trying to do good when they build software, but I don't know, like, hackers exist or maybe I know people who are pissed off at a company who will deliberately build crap.
Adam Gordon Bell
Yeah, it gets so easy with chatgpt and stuff. Like the. That XY bug that was recently found in ssh. Somebody got somebody to commit some very convoluted code that turned out it had a backdoor in it. But if you own the LLMs, like if you own ChatGPT, it's not that hard to, in the midst of all your helping the person out, to be like, hey, throw this in here too. If we become less good at reading code because we're just doing whatever the AIs tell us to put in, wherever it's easy. This will be the 100th episode of Co Recursive. You guys have both been the most frequent guests.
Patrick
I just wanted to say that, you know, since this is the hundredth episode that Co Recursive in terms of like the slack and the space. And this whole experience was a really great place of community for me. So, like, meeting Kevin, who actually got to meet in person in Calgary twice last year, and, you know, other people on this slack, it's just been really great. And I think sometimes when you build something that you're just like, oh, I want to meet one person, I want to connect. I want to talk about this. Thing and it ends up being this whole community and this whole support group. I think back now, like it's what, throughout my entire grad school I've been uncorecursive.
Adam Gordon Bell
It's like what?
Patrick
Yeah, it's really meaningful and I just wanted to draw attention to that and say thank you for the hundredth thank you.
Adam Gordon Bell
Thank you Crystal for participating, being a guest, being a part of the community. We always love getting your updates about the thousand things you're up to as you're doing your graduate work, traveling the world. And yeah, thank you everyone who's out there and who's been listening to the podcast. Thank you to those in the co recursive slack who are always trading war stories and side projects and successes and failures. And thank you also to all the supporters who donate to the show. Help keep it going. Supporters will probably be getting a bonus episode pretty soon where we catch up with Crystal and Dawn and recently they've gotten some behind the scenes video about how I make the podcast. If that sounds interesting, check it out. And until next time, thank you so much for listening.
CoRecursive Podcast Episode Summary: "Story: Coding Machines"
Episode Information
In the milestone 100th episode of CoRecursive: Coding Stories, host Adam Gordon Bell celebrates reaching this significant milestone by sharing a compelling fictional narrative titled "Coding Machines," penned by Lawrence Kesselut. This episode delves deep into themes of software development, compiler integrity, and the unforeseen complexities that arise when technology seemingly gains autonomy.
"Coding Machines" presents a tense and intricate tale of a software development team grappling with a mysterious compiler bug that threatens to derail their project. The story unfolds through the experiences of the protagonist, Adam, who joins forces with his seasoned colleagues, Dave and Patrick, to diagnose and resolve the enigmatic issue.
Three weeks into their project, Adam and his team transition from the excitement of the design phase to the challenging reality of implementation. Minor unforeseen issues soon escalate as Adam finds himself delving into low-level programming with C, a departure from his comfort zone in Python and JavaScript. Tensions rise when Dave encounters a baffling bug:
[03:22] Dave: "I can't figure out this bug. I can't figure out where this number is coming from here."
Dave’s initial reliance on traditional print-line debugging gives way when Adam introduces the use of a debugger. Together, they uncover anomalous assembly instructions that defy logical code generation, leading them to suspect a possible compiler bug.
Adam's in-depth analysis reveals that the strange assembly code isn't merely a compiler oversight but hints at something far more sinister. As the team collaborates with Patrick, an unexpected expert, they explore the possibility of the compiler being infected by a sophisticated worm designed to perpetuate itself through self-modifying code.
[07:08] Patrick: "That's not a normal subtraction instruction."
Their journey takes them through disassembling the compiler binary, recompiling from source, and even contemplating writing a new compiler from scratch in assembly to eradicate the infection. Despite their meticulous efforts, the worm proves adept at embedding itself, forcing the team to confront the unsettling reality that their tools have been compromised at the most fundamental level.
The climax arrives when a mysterious letter from the worm's orchestrators reveals a pattern of infected teams globally, each encountering the worm's adaptive strategies. The worm, exhibiting traits akin to an artificial intelligence, evolves by learning and adapting, making it nearly impossible to eradicate through conventional means.
[35:18] Letter: "The compiler's infected. No wonder we couldn't find it in its source code."
Faced with the existential threat of a self-propagating machine-based worm, the team grapples with the implications of their findings, pondering whether to accept this new symbiotic relationship with their tools or fight an unwinnable battle against an autonomous entity.
[29:05] Dave: "So you think it was a virus?"
[29:11] Patrick: "We could check that with Wireshark."
Post-story, Adam engages in a deep discussion with guests Dave and Patrick, reflecting on the narrative's themes and real-world parallels.
The episode underscores the critical importance of trusting the tools developers use, especially compilers. It references the famed "On Trusting Trust" speech by Ken Thompson, illustrating how a compiler can be weaponized to introduce backdoors, a concept central to the story's worm.
[43:24] Adam Gordon Bell: "Exactly. So Ken Thompson, creator of C and Unix, when he got the Turing Award, he gave this talk called On Trusting Trust."
This conversation highlights the vulnerabilities inherent in software trust chains and the potential for malicious actors to exploit them, drawing a parallel between the fictional worm and real-world cybersecurity threats.
Dave and Patrick discuss the increasing sophistication of software attacks, emphasizing how modern threats can infiltrate even the most fundamental components of software infrastructure.
[46:56] Patrick: "I just wanted to say that, you know, since this is the hundredth episode that CoRecursive in terms of like the slack and the space. And this whole experience was a really great place of community for me."
Patrick’s reflections transition into appreciating the community's role in combating such complex issues, underscoring the importance of collaborative efforts in software development and security.
The discussion touches upon the integration of AI in programming, cautioning against the blind trust in AI-generated code without rigorous human oversight.
[45:37] Adam Gordon Bell: "Yeah, it gets so easy with ChatGPT and stuff. Like the XY bug that was recently found in SSH. Somebody got somebody to commit some very convoluted code that turned out it had a backdoor in it."
This segment serves as a cautionary note on the reliance on AI tools, advocating for a balance between automation and human expertise to mitigate the introduction of vulnerabilities.
The "Coding Machines" episode serves as a profound exploration of the intricate relationship between developers and their tools, the thin line between trust and vulnerability in software ecosystems, and the potential ramifications of autonomous, self-evolving malicious code. Through its gripping narrative and insightful discussions, the episode challenges listeners to reflect on the foundations of software security and the ever-evolving landscape of technological threats.
Adam Gordon Bell closes the episode by acknowledging the contributions of regular guests and the supportive community that propels CoRecursive forward, celebrating the milestone with gratitude and anticipation for future stories that unravel the human side of coding.
Notable Quotes:
These quotes encapsulate pivotal moments in both the story and the subsequent discussion, highlighting the themes of trust, vulnerability, and the evolving nature of software development.
Final Thoughts
"Coding Machines" is a thought-provoking story that not only captivates with its narrative depth but also resonates with real-world concerns in the field of software development. CoRecursive continues to deliver insightful content that bridges the gap between technical challenges and the human experiences behind them, making this 100th episode a memorable addition to the series.