
PCSX2 is an open-source PlayStation 2 emulator that allows users to play PS2 games on modern hardware. The emulator is remarkable for simulating the complex architecture of the PS2, which includes the Emotion Engine CPU, Graphics Synthesizer,
Loading summary
Tylo
PCSX2 is an open source PlayStation 2 emulator that allows users to play PS2 games on modern hardware. The emulator is remarkable for simulating the complex architecture of the PS2, which includes the emotion engine, CPU, graphics synthesizer and specialized subsystems. The emulator just hit a major milestone with the release of PCs X2 version 2.0. The release brings many changes including a QT based interface, big picture mode, auto selection of graphics APIs and native support for macOS. Telokrinkle is a developer for PCSX2 who ported the emulator to macOS, among other contributions. In addition to his work on PS2 emulation, he has also worked on Dolphin, which emulates the Nintendo GameCube and Wii. Tylo joins the podcast with Joe Nash to talk about how he got started in emulation, the PS2 architecture, the challenges of rendering PS2 games on modern GPUs and more. Joe Nash is a developer, educator and award winning community builder who has worked at companies including GitHub, Twilio, Unity and PayPal. Joe got his start in software development by creating mods and running servers for Gary's mod, and game development remains his favorite way to experience and explore new technologies and concepts.
Joe Nash
Welcome Tello, how you doing today?
Tylo
Pretty good. I'm happy to be here.
Joe Nash
Wonderful. So I want to start off by asking you. We've spoken to folks from the emulator scene before and it seems people come in to this whole thing from many different varied paths. How did you get started building emulators?
Tylo
Yeah, I think MyPath is probably maybe one of the weirder ones. Well, to start, I guess I kind of. I looked up a lot to emulators and emulator development when I was younger and so for a long time I kind of wanted to try to work on the Dolphin emulator and I never ended up getting around to it for a long time. But then I guess after graduating college I took a part time job for a while and I had some extra time and I was like ah, maybe I can work on an emulator. And I at that point kind of looked around and I was like, hey look, it would be cool to the PCSX2 emulator. At the time it didn't support macOS and I was using macOS as my main operating system at the time. I guess I still am at the moment. And I was like it would be cool to have a Mac version of that, right? And so that was just kind of my plan was like, hey, I'll just. I mean I knew they previously had a macOS interface like they had old builds from probably like, you know, four or five years ago, maybe even older at the time. And so I was like, oh, well, you know, I'll just, you know, it must be at least somewhat compatible. Not compatible, but like, you know, there's.
Joe Nash
Some legacy of it working on the platform there.
Tylo
Yeah, yeah, yeah, exactly. And it's like, okay, well, we'll just try to get it working. And yeah, so I. I started that. It turns out it wasn't too bad to at least get like, you know, the interface showing up, not the actual emulation or anything, but just like, oh, you know, we want to have an interface. There were some minor bugs with some of the. At the time on Windows, they were using a mixture of WX widgets and like, direct win 32 stuff. And then on Linux they used WX widgets with. Where Windows had Win32 stuff, they use GTK. And so I just like, there is a GTK version for macOS. It's not amazing. It's kind of terrible, actually, but it does technically work. And so I just used that on macOS for the initial build and we got some Windows working. And from there, well, there were two things that mainly stood out. The first is that macOS, so at the time, PCSX2 supported two graphics APIs, OpenGL and DirectX 11. Yeah, 11 at the time. And macOS had. Obviously DirectX 11 is just a Microsoft thing, so no one else supports that. And then OpenGL macrosos had kind of abandoned it probably about three years before and completely stopped updating it to stay up to date with the latest features and stuff. And so there were a whole bunch of PCSX2 had workarounds for lack of various features, and a whole bunch of them had been deleted from the emulator. About four years before my first attempt, and I had at this time no graphics programming experience whatsoever, was to attempt to undo the deletion of all of these fallback paths for lack of various features. So, yeah, so I was like, oh, well, we'll just, you know, PCSX2, as pretty much all emulators are pretty much. And software projects in general are under a system called version control, where every change to the emulator is kind of tracked as like a sequence of changes. And so you can go back to the old one and just be like, can I just undo that one, please? So I started by just trying to undo all of the commits that had been there that were removing these feature workarounds. And a few of them, the code has changed too much since then to just Directly undo it. So I kind of tried to fix it during directly and it didn't fully work. So I was still kind of stuck with the black screen. And so at that point I brought it to a Linux computer and I like artificially disabled all of the features that weren't supported on macOS and then enabled them one by one to figure out which one might attempt to bring back. The workaround was broken on and eventually through that was able to get at least an image showing up on macOS.
Joe Nash
That's a really interesting way to get into an open source project like getting into the git blame and bringing back some old features. That's pretty interesting. Also fascinating to hear that you weren't into graphics programming before you got into the emulator because you know, having looked a little bit, but some of the stuff you work on, some of the things you work on outside the project, it seems very much now that graphics is your world.
Tylo
Yeah, yeah. Since then I've done a lot of graphics stuff. But yeah, when I started on the emulator that was actually my first usage of OpenGL and I kind of was like blindly going through there being like, oh well, you know, they had this before so it probably works something like this. We'll just try this. That's how I had like no clue what I was doing and didn't know how to fix my black screen without just trying to compare it to it running on Linux to just figure out which was breaking.
Joe Nash
Amazing. Okay, so I want to get into talking about the emulator itself and how it works and you know, your work on it, especially on the graphic synthesizer. But I guess first, because I know it's a fairly wild platform, I guess to talk about the emulator, we should Talk about the PlayStation 2 as a system and architecture. Can you tell us a little bit about how the PlayStation 2 worked and I guess all the pieces?
Tylo
Yeah, so the PlayStation 2 it's like three or four processors all kind of connected via a DMA a memory copy engine. So there's the main CPU which is known as the EE or emotion engine. And that's like a 300 MHz MIPS CPU with a few weird instructions on it. But then they have attached to that CPU or I guess sitting nearby, two vector units which are each this like custom instruction set that is just used for processing. If you've heard of whatever SSE on like Intel CPUs which allow the CPU to operate on four values at once with each instruction. They're kind of similar to that except for it's like a very dedicated instruction set that doesn't really have much else outside of being able to do these vector float operations.
Joe Nash
Is it kind of like the kind of thing you'd use CUDA for? Sorry to interrupt, but is it like, kind of in that? Yeah. Okay, cool.
Tylo
Yeah. So those VUs. So there were two VUs, there's VU0, which is kind of meant to be used as like a CO processor for the CPU for the ee. And so the EE can kind of like just reach into its register. It can actually use it, like as a CO processor. It can just like execute single instruction instructions on it directly, or you can send it like a little micro program that'll just like run through for like, you know, 20 or 50 instructions long, usually something like that. And then VU1. So the two VUs have the same architecture, but they're just kind of like attached to different things and therefore expected to be used differently. So VU1 is like, slightly further, I guess, from the CPU and it's kind of meant to be used in. Similar to what in a modern graphics card would be the vertex shaders. So it's got like a thing that allows it to like, load stuff from memory into its, like, internal working memory, and then it would run the program over that, and then there's an instruction in it that just takes whatever's in the internal working memory and just sends it to the next chip on the PlayStation 2, which is the GS, the graphics synthesizer. So you can think of that as the like. Well, the PlayStation 2 is old enough that it doesn't have actual shaders, but you can think of it as doing the part of the graphics workload that would now be done by fragment shaders. So it starts by receiving just like lists of triangles that have already been transformed and are in coordinates that are just like 2D coordinates on the screen, with a Z value only for depth testing and nothing else. So the coordinates are already in 2D and it just rasterizes the triangles and then colors them in with the texture. And that's about it. Right.
Joe Nash
God, there's a lot going on. I guess, like, the console really came. I don't. You just said that it doesn't have actual shaders. I guess, like, the development of this console was at such a point where so much of what we now take for granted with 3D gaming is still being decided, that the architecture is just kind of all over the place.
Tylo
Yeah, pretty much. So, yeah, it has something you could think of as similar to a vertex shader, but no fragment shaders. Which is kind of funny. It's kind of like the opposite of the gamecube where they have kind of fixed function vertex processing and then they have a semi programmable fragment processing.
Joe Nash
Interesting. Okay, I guess this is just very PC brained of me, but it's also really interesting how the various parts of the graphics processing are split across so many different pieces of hardware. I guess we're so used to now being like there's the big monolithic graphics card, right.
Tylo
Gpu and it just processes things. But yeah, no, back then they even the split was kind of different. And so the vertex shaders are much closer to the cpu. In fact so close that when we emulate it, I know we've heard people asking us like, why can't you put the VU stuff into vertex shaders? And it's like the VU can so much more easily communicate with the EE than a modern computer's vertex shaders can communicate with the cpu. That like that just wouldn't work. Okay.
Joe Nash
So yeah, that I guess brings me on to my next question, which was that like, what is it like to emulate this architecture? I imagine there's trade offs with the fact that you're working online. I imagine this was a lot of stuff going on in parallel in that architecture. Right. That's difficult to catch.
Tylo
So that tends to cause a lot of fun for us actually. I think with the exception of a few games that really abused the ability to very tightly synchronize between the EE and the vus. And there are a few games like that that we just can't emulate properly because they expect ridiculously accurate cycle timings between the various CO processors. So we try pretty hard to track cycle timing within a single processor. So the EE may be a little less so, but the vus, we very tightly track it because they actually kind of reveal some of their internal timings to the programs. And so you kind of have to. Which I can go into a bit more later if you want.
Joe Nash
Yeah, that's fascinating. Yeah, we can come back to that.
Tylo
Yeah. But outside of that, there are only a few games that really abuse that hard enough that we break on those. The other fun ones are the. Some of like the instructions themselves on some of these things do things in weird ways that are hard to at least quickly emulate. So as an example, I think one of the more famous ones is the PlayStation 2's floating point math. So nowadays in even like the GameCube, I remember Dolphin had a blog post on one instruction that was slightly off from the way you're supposed to do it according to the standard. And it was, you know, breaking all their replays of Mario Kart stuff or whatever. But that was about it. And that was like one teeny little difference in how they were handling, I think, a fused multiply add in that, in the PlayStation 2's case, the floating point math just like completely ignores what's now standardized as how to represent a 32 bit floating point value.
Joe Nash
I saw an offhand comment about this, might have been on the PCs Wikipedia, but their floating point's not to the IEEE standard at all.
Tylo
So it's kind of amazing. So the first thing so in PC floating point, so the way it's set up, the 32 bits are split into a single sign bit to say whether it's negative or positive. Then 8 bits for an exponent, and then 23 bits for what is usually called, I think, a Mantissa. So pretty much it's the computer equivalent of scientific notations. You say it's like instead of saying a number, it's like, oh, it's 57. You instead say it's like one point something times two to the 32, right? And that's how floating point values are able to represent both really large values like 10 to the 300ish, as well as ridiculously small values like, well, 10 to the negative 300ish. So that's kind of the representation. But on the standardized floating point, they have a whole bunch of extra things to handle edge cases. So as you go up near the very top of the floating point area, the highest exponents field value that can go into that field is used for storing infinity. And not a number where infinity indicates that if you go above a certain value, it's there to kind of stick around. So if you go above the maximum float, it'll go to infinity. And then even if you try to divide that in half, it'll stay infinity. And if you try to divide it more, it'll stay infinity. And then not a number is for things that go even more offhand. So if you try to take infinity and multiply it by zero, that's even more ridiculous. And it's like, okay, that turns into not a number. And then once you have a not a number, it just kind of like spreads around. Because any operation that includes not a number spreads the not a number to indicate to you hopefully by the end that, oh, by the way, something went wrong in this calculation, it's now not a number. Like, we have no clue, right? And the PlayStation 2 they were just like, who needs that? And so the highest exponent value is not for infinities and not a numbers. It's just one higher exponent than the previous. Which means that you can represent floating point values that are twice the size of. Because remember, this is exponential. Twice the size of the maximum PC floating point can be represented by a PlayStation 2 in their floating point. So that's like the first part of the issue is that it's like any number that goes slightly higher than the maximum PC floating point is now accidentally on PC becomes infinity. And it's kind of a minor issue when it's just, oh, they're big numbers. But when you remember that infinity is meant to kind of spread, that's when things get really messy. Because like, if you multiply by zero on a PlayStation 2, you're guaranteed to get zero back, right? Like every number being a not infinity number, you multiply it by zero, you get zero. But on a PC you multiply by infinity by zero and you get not a number.
Joe Nash
Nan, you've just like opened a door to me. I feel like PlayStation consoles in the past have been the source of more like PlayStation. The PC porting has been the source of some really cursed ports. And I didn't know that like, you know, this was an issue. And now I'm just like, oh God, the amount of errors this brings up.
Tylo
If you want some fun on that. Specifically, the Dolphin blog has a blog post about a game that was ported from PlayStation 2 to GameCube. It was whatever, I think True Crime New York City, I think where they mentioned that this is one of the games that in Dolphin ended up making them have to emulate floating point exceptions. Because of course this was right back to the PlayStation 2 floats that if you multiply by zero, you know, whatever. And so in their case, I think they added a. When they ported to the GameCube, their AI must have like broken. And so they added a division by zero exception handler that would just say, oh, oh wait, you just divided by zero. Let me replace the correct answer. There was. I think it's just zero. I don't. Or maybe it's. I don't remember what the exact number is. But whenever they accidentally divided by zero, they just catch the exception and replace the number with a like benign thing that wouldn't blow up the thing. And then Dolphin had to make sure that they could actually emulate these division by zero exceptions. All because of a game that was like scooted over from a PlayStation where Division by Zero didn't result in I guess in this case it would be infinities or nada numbers that stuck around.
Joe Nash
Incredible. Okay, so that was one. So sorry, I interrupted you mid flow. So we had the issue of numbers. It can be much bigger on PlayStation. And then of course you don't get when you are working on a different platform, you've got the propagating infinities. And then I think you were about to say there was another issue that arose from this.
Tylo
So I guess I'll just mention some of the things that we do to work around that. So yeah, so because of that PCSX2 just has, if you look through I think the advanced settings or something, there's a number of like clamping modes. And so what that is is after every calculation we do with these floating point values, if you have the clamping modes enabled, it will just do like a min and max with the PC min and max floats to just remove all of the infinities and turn them into just large numbers which will at least you know, multiply by zero properly in the PS2's opinion of properly, even if they aren't quite the same number. So that's how we like try to deal with it. And it doesn't work for every game. And so I think we have a bunch of things to like enable various modes on our thing for specific games that when we know that you know that they prefer break less.
Joe Nash
Right.
Tylo
But there are still games that like the AI just doesn't quite work correctly because we aren't quite doing the math correctly. So then going on the once last Fun with the PlayStation 2 floats, it comes down to so that problem was like with the really big numbers. And so the other problem we have is with little small changes in the like low bits of a number, right? So like low bits being like the ones that make the least difference. Right. When you have 1.0034, you know, whatever that last digit. When the PlayStation 2 does math, the IEEE standard, the standard behind PC floating point has a thing where they're like oh, you should. Or at least for the default mode, when you do math like adding or subtracting values, you should round the result to like as if you had infinite precision when you did the calculation and then rounded it to the nearest float. And to do that they employ I think three extra bits of precision when they're doing the calculation. One of which is, I think one tracks like if you're right below or above 50% and then one tracks if anything below that has ever gone above zero. So pretty much it's to be able to do a round even rounding, right. So if you think of doing a round even right, it's like if it's below 0.5, you round down. If it's above 0.5, you round up. But you have to know if it's exactly 0.5, at which point you, in the case of floating point math, you round towards the even number. And so that requires an extra three bits in every calculation. And apparently whoever was designing the PS2 is like, that's too many bits to keep track of, right? And so they just don't. So I think we have figured out that their addition uses just one guard bit, is what those are called. And we had to do that because apparently some game does, like decryption of its game binary using the floating point math unit. And so if you do it even slightly wrong, it just, you know, breaks everything because they're like decrypting data or something like that. So we specifically have a thing in there to like truncate the numbers before we do math with them to make sure that these extra bits of precision are just deleted. And then the other one is multiplication. I don't think anyone's actually fully looked into exactly what it does in ITS math, but what we do know is that if you multiply a number by one, usually you'd expect that multiplying a number by one will get you that number back, but According to a PlayStation 2, there are certain numbers that if you multiply by one, they will change slightly.
Joe Nash
How many certain numbers are we talking here? Are we talking like, ranges are very big or weird things? Like one specific, very precise number will mutate.
Tylo
I think it ranges. But one of the other PCs 62 devs named Phobes went over this on their blog. I think they were actually going to be in the interview, but they couldn't make it in the end. So it would have been cool to have them explain that because I'm not quite as familiar with it, other than the fact that there are definitely numbers that you can multiply by one and they'll just. Just lose a bit. They'll just shrink by one bit. Oops.
Joe Nash
So you mentioned the graphic synthesizer earlier as an area that you've worked a lot on. What interest? Like, how did you get into that particular. Was that an accident? Because of this, like, starting out, the OpenGL thing, was there something about that chip in particular that interested you?
Tylo
Yeah, so initially I, like, worked a little bit on it, right, to get OpenGL working and stuff. And then I was like, oh, hey, you know it works, yay. And I kind of set it down and went to work on and actually, I guess I didn't mention this earlier, but the next thing I worked on was Apple had just like wiped 32 bit support off of their operating system like the year I started working on this. And so I had done a lot of my work originally by just using an older computer that hadn't gotten a 32 bit support wiped off of it. But then to actually get it working on like my main laptop, the next thing was like, oh, let's actually get this working on 64 bit. So I kind of like went over there for a while and that's actually when I my first ever PR to to the project was actually on for the getting 64 bit support. Mac stuff came later, mostly because it was kind of like, well, you know, that was a project that didn't affect everyone outside of macOS, right? And also because the developers were like, you know, we deleted all these things four years ago because we were kind of sick of maintaining them. We're not sure we want those back. We're not sure where we want all of those hacks for like really old OpenGL stuff. We're not sure we want those back. And so one of the things that was on the back of my mind at that time was like, oh, well, the reason Apple's abandoned OpenGL is because they want everyone to use their new API metal. So maybe I should consider writing a metal renderer for PCSX too. So that was kind of like on the back of my mind for a while. So that was kind of, I guess, part of it. And then the other was during my time working on the 64 bit support, that's when I kind of got brought into like the rest of the PCSX2 team and join their Discord whatever. Said, hi, got to know everyone pretty well by the time the 64 bit stuff actually got merged because that took a little while. And so one of the other people in the Discord did a lot of their work on the graphics stuff in it. Oh, it looks kind of fun. And so I think I played around with some small fixes to the thing before I attempted to write a metal renderer for PCSX2. And yeah, the first attempt didn't actually go especially well. So the one that's now in PCSX2 was attempt number two. So it kind of went from there. During my first attempt at writing the metal renderer, where I found a lot of things that I kind of wanted to change about the graphics thing to make it more friendly to adding new renderers. And so a little while later I went through and tried to do a overhaul to make it easier to add new renderers and then added the Randall vendor, the final version, the actual published version.
Joe Nash
Yeah, that's always a very, very satisfying genre project, like making something extensible. I think it's a nice area to work on. So you mentioned a bunch of Apple graphics as we've gone along and a bunch of. It's been very surprising to me. I knew they had the metal API. I guess I didn't realize how quickly they deprecated everything else. I know very little about the world of Apple graphics and especially now that they're completely over in custom chip planned, I feel like it's completely invisible. Can you tell us a little bit about, I guess, like how Apple approach graphics processing?
Tylo
I guess, yeah. So from the API standpoint, especially earlier when they were still running metal on the same graphics chips that everyone else was using, and so it was like, obviously the hardware could do the things that OpenGL supports because that's why they're in OpenGL. And so obviously the feature set was kind of similar to newer OpenGL. The OpenGL Apple supported was just missing like major features. Like, you've probably heard of compute shaders, right? Like the ability to run random things that's not graphics on your GPU. Yeah, Apple's just missing that from its OpenGL because it got introduced after, I think Apple abandoned OpenGL in like 2009. So pretty much everything since then they don't have. So yeah, metals, if I had to describe it, I'd say it's most similar to like a modernized DirectX 11. So if you think of like Vulkan and DirectX 12 are like very different from DirectX 11 metals. Kind of like if you tried to bring the important things that are in Vulkan and DirectX 12, if you tried to bring those to DirectX 11 instead of doing a whole new API, that's would be my description of metal.
Joe Nash
Yes. That sounds very Apple Y to me.
Tylo
Yeah. The other very Apple thing that they did is they just were like anything legacy, goodbye. And so everything that you're kind of recommended not to use in the Future from DirectX, things like geometry Shaders is a really big one. Thankfully this didn't really affect PCSX2 too much. But yeah, Geometry Shaders was the big one at Metal was like, nope, no Geometry Shaders. Because previously everyone was like, right. They added Geometry Shaders and they turned out to be like really slow. And so they were like, don't use geometry shaders, but they're still here. And so Apple was like, nope, no geometry shaders, sorry. Which caused a lot of people a lot of pain, including us for a bit, because PCSX2, while we didn't use geometry shaders for too much, we did use them for a few things. And so that was probably the last things. That's what took the longest to come to the metal renderer. For a long time we didn't have the ability to upscale points and lines in the metal renderer because that required geometry shaders for a long time. By required I mean our code for it was using geometry shaders we obviously didn't require because we switched off of them.
Joe Nash
Not as in like an immutable lore, like the only way to implement that was geometry shaders. That's just what you were doing.
Tylo
Yeah, yeah, it was a very common thing, especially because it's like the straightforward thing is, right, like you have a point and then you send your point to the geometry shader, which turns your point into four vertices of a triangle. And that sounds like the kind of thing that a geometry shader would be good for, because that's what they were for, right? But it turns out apparently they just weren't that fast at even simple things like that. It was still faster too, if you're wondering. The new version, which I actually learned from a AMD presentation on how to doing this faster than using geometry shaders on all GPUs. It's like you send your vertex shader, you take advantage of the fact that newer GPUs can just kind of load from memory however they want. And so you instead of using the hardware, so modern GPUs either have or pretend to have dedicated hardware for loading vertex data. And so then you just ignore that hardware, or the pretending of that hardware as it may be on I think AMD GPUs only pretend to have it. I'm not sure Apple GPUs definitely only pretend to have it. You ignore that hardware, if it even exists, and just manually fetch the vertices. And so you instead of using a shader to expand one point into four vertices, you say there's four vertices and then you divide the vertex index by four and so have the four adjacent vertices load from the same pixel, the same single pixels data, and then just like offset it and it ends up working out and being faster than geometry, at least according to AMD's presentation that I watched I never actually benchmarked it.
Joe Nash
Personally, so all the low level details of graphical techniques always just seem like witchcraft. That's awesome. And yeah, good to understand the whole Apple API journey. It's very objective C to me. What you said about going with making something new, that's kind of the old thing with all the things I imported. So one of the things we've heard from emulator devs in the past, and I think that you mentioned when we first started talking, was in your journey in working on the emulator and especially on the GS, you obviously come across weird ways the games have used it. What are your favorite cursed games that you've had to deal with on the emulator?
Tylo
Yeah, let's see. I think we'll go with sotc. Sotc. Shadow of the Colossus. There we go. Shadow of the Colossus, Yes. So they have this bloom effect that they run on the GS and the way they do it is they, over each frame they like accumulate this texture by like, I think, I don't remember exactly, but I think what they do is they take a silhouette of all of the things they want to put this bloom effect on and then they make that white and then they like spread it around by just, you know, render it up slightly above, slightly to the left, slightly to the right, slightly down. Because they remember they don't have shaders, right? So they don't have shaders. You can't just like sample the texture in three places. So they just, you know, render it like four or five times over the, over the same image. You know, the GS had a lot of, it didn't have shaders, but it had a lot of fill rate. So the ability to render a lot of pixels for the time very quickly. And so that's just what they did. But the fun part is so they'd like accumulate this texture and their calculation was set up so that it would. To get their bloom effect, right. The white part would just kind of. Because each frame they blur it slightly more. It would come out in kind of like a halo. Right, because that's the effect you want, right? But as it turns out, the thing preventing its spread from continuing too far was the fact that the PlayStation 2's blending rounds down. Okay, Right. So when you do the blend, right? So remember they're spreading this by taking the image and just rendering it with like 10% opacity many, many times. And so each time you do that calculation for like, oh, 10% of this one plus 90% of this one, put them together, right? Most PC GPUs, as far as I know, they'll round nearest, right? They'll round to the nearest. So if it's above 50%, they round up. And if it's below 50% of the way to the next value down, they round down. But on the PlayStation 2, it truncates. So if it's any less than, Even if it's 99% of the way to the next pixel, all the way back down. And so the only thing preventing this bloom from just like exploding across the screen and turning the entire thing white is the fact that the blending here between these values that it rounds down. And the fun part is that, right, PC GPUs these days, right, everything's shaders, right? Everything's shaders. It's all programmable except the blending. The blending is not programmable except for on Apple GPUs, as it turns out, and intel ones. But on AMD and Nvidia GPUs, the blending is not part of the shader. The shader outputs a color and then it goes into a special hardware unit that combines the color from the shader with the color that's already in the texture. Which means that it's not very easy to modify the. Right, like everything in the shader, it can just be like, oh, we'll just calculate it slightly differently. But when we. Once it's sent to the hardware blend unit, the hardware blend unit is now in charge and we can't really do much about it. And so Shadow of the Colossus is, I think, one of the more famous games, but we have lots of fun with blending on the PS2. And so we actually had a whole bunch of like, attempts at trying to make this work better. So like, one of the things was like the PC blending, right, it allows you to. The main calculation rate is the alpha value. One you want would mean fully opaque. So that would be alpha value times the source color plus the one minus alpha times the destination color, right, of the texture. And so one of the things we tried was like, well, what if we did? We can't mess with the value once it's gone to the blend unit where it. Because the big issue here is that the shaders run in super in parallel, right? You're a modern Nvidia big AMD GPU has like 10,000, something like that, Cuda cores or stream processors or whatever each manufacturer calls it. So like 10,000 of these mini CPU ish things, they're not really a full CPU. I mean, they're closer to A single lane of SIMD unit. But anyway, like, 10,000 things that'll be calculating on pixels at once. And then the way GPUs handle their, like, getting good memory bandwidth is by just running a whole bunch of things at the same time on each one of those. So not only is it there, like 10,000 calculations happening at once, but there are also probably about 10 times that many pixels that are just waiting for their memory accesses to finish. So you're looking at like 100,000 or something pixels that are like in flight at once. And there's, you know, if your triangles overlap and they need to blend with each other, that needs to happen in a specific order, right? If you put the first pixel down and then the second on top of that, if you're blending with each other, that'll look different from if you put the second pixel first and the first one on top. And so this is the reason that they don't want your shaders being able to try to load from the texture that you're rendering to, because then they're gonna have to order themselves, which would be not great. And so this is the reason that you can kind of do whatever you want as long as you're not looking at the texture you're rendering to. But once you need something that looks at it, you have to send it to this special hardware blend unit that figures out what order everything needed to be in Re, you know, fixes up, you know, reorders them if it needs to, or however it does it, who knows? That's the manufacturer's deal, not ours. And then blends them in order. And so we can do whatever we want up until then. And so we're like, well, what if we do that first? Source texture times source alpha multiplication in the shader, and then we can subtract a little offset to do the opposite of what the PC, right? So the PC rounds at 50%. So what if we subtract half a texture value, right? Texture value, so we just bring it slightly lower so that when it does the 50% rounding, it now is actually rounding on a value that's slightly smaller than what it originally would have, and then it'll hopefully round closer to the correct way. And that was the hope.
Joe Nash
The hope. Did that work out okay?
Tylo
It worked on AMD GPUs, right? It turns out there's no requirement that the shader output have any more bits in it than the actual texture it's going to. And so on Nvidia GPUs, it turns out they were half precision, floating point, 16 bit of data with like 11 bits of precision or something like that. And they were trunking it right back down to eight before they sent it to the blend unit to, you know, save the little. I'm sure same saves a little bit of bandwidth, right? But it completely undid everything we were trying to do. It just undid it. So we do have SOTC looking nice on AMD and Intel, but I think Nvidia, you have to raise the blending accuracy setting. We have a PCSX2. The big hammer for this is there's a blending accuracy toggle. And as you bring it up, we switch more and more draws to software rendering, where we really do actually load the value that's currently in the texture, look at it, combine them and put it back. And we make GPUs really, really unhappy. Because to do that, we of course now have to order all of these pixels. And, you know, as a software, we don't really know which pixels are and aren't ordered relative to each other. So our Big Hammer approach to this is that after every. We draw one triangle at a time. And after every triangle, we tell the gpu, please flush all your caches and make sure that the sheeter can read the value that was just rendered to. Okay, one more triangle. All right, Flush all the caches again. So it's very expensive. GPUs. Hate it, but it does get you the correct pixel values.
Joe Nash
Incredible.
Tylo
And so we have this big blending accuracy slider. It's a dropdown menu, but with like six different options. And the only thing those different options do is they increase the number of triangles that we apply our big hammer to. So, yeah, the BASIC mode, we, like, track down what most games tend to do that ends up requiring the software blending approach. And we only do it for those. And that usually only adds a very small overhead because we were, you know, when you're only doing it for like 50 triangles, like, oh, well, but then you go to medium and then high and then on high. Some games are looking at like 10,000 of these. Single triangle followed by a barrier, followed by a single triangle followed by a barrier. And that's when, like, AMD GPUs especially hate this. And you get like 10 FPS or something, you know, so some terrible, terrible frame rate on even like, big, powerful GPUs. And they use very little power while they're doing it because, like, most of the GPU is just sitting there waiting. It always looks kind of funny. It's like, oh, yeah, my massive GPU is using like 30 watts, but it's trying really hard. But it's only using 30.
Joe Nash
It's really suffering, but it's using no power to do it. Yeah, okay. So I guess with all that in mind, it seems like emulating the PlayStation is just really hard. How close is true to original console would you say we are overall on PlayStation 2 emulation?
Tylo
It really depends on how true do you need?
Joe Nash
Right, yeah.
Tylo
One of the things we do have, we have a software renderer, which is where we do all of the pixel, everything. We do the work that you would normally do on a gpu, we just do it on the cpu. And because especially for a long time, it's gotten a lot better now. You know, we properly detect a lot more games, weird things. We properly detect a lot more games, weird things now, so that we're a bit better at it. But like, in like, PCSX2.1.6, there was like, so many games that just wouldn't work well on the hardware renderer that unlike, for example, Dolphin Software renderer is kind of meant for, like, developers to just verify things. But the PCSX2 software renderer is meant for actual users to use, which means it's not actually fully accurate, which is a bit funny, but it uses memory the same way that the real GS does, and that helps a lot of things. And so that's like, one of the ways you can get a bit closer is that use the software renderer. It runs full speed in many games, especially on modern, like Ryzen 5000 series. Like most games will run full speed on the software render, which is pretty cool. And that gets you much better support these days. They're a lot closer, but there's still definitely games that run better on the software render. Then from the EE side of things, where we have, like, the synchronization of whatever, there's a specific game. The engine is. I think it's called the Blue Shift engine. Is it Marvel, Rise of Something something. There's a specific game that will kick off the calculation that happens in, like, cycle timed in parallel between the E and both of the two vus. And like, hilariously, on the issue for this game, we were discussing it and one of the developers from that game came over and posted the code that they had written that we were screaming about internally. They posted it to their Twitter, they posted a link on that thing. And so I got to actually go look at the actual source code for that game and it was like, wow. So, yeah, so it runs on the EE on a single cycle. So the EE is a super scalar MIPS processor and it is able to run two instructions at once, which I'm sure at the time was amazing. Nowadays CPUs run like 12, or maybe not quite 12, but like 8ish instructions at once. Now, by the time, 2 instructions at once. Wow. Right? Can run 2 instructions at once as long as they don't conflict with each other. And so they start by running one instruction to start VU zero and one instruction to start VU one, making sure that they happen on the exact same cycle. And then they counted the cycles for every instruction from there. So they know exactly. Right. Which instruction on the CPU is running at the time of each instruction on running each of the VU programs. And so then the cpu, they can just kind of like be like, oh, I know the VU will be done with this. Let me just yoink that right over there. And then have VU0 just yoink from the VU1 registers. Because VU there can kind of just reach into VU1's registers if it really wants to.
Joe Nash
Right, right. And so this is when you mentioned earlier that the VU exposes its timings to programs is where that comes in.
Tylo
Yeah. So they kind of just reach into each other's stuff and then they expect to obviously be running two instructions from the EE one from VU zero, one from VU one, two more instructions for the EE one from Cuzero, one from VUE one. And we currently our best is like to synchronize after like every eight instructions or whatever. And even that's like really slow or something. So the current opinion on that one, that one of our other developers was like, I don't think maybe in 10 years. Probably not in 10 years, but yeah. So there's always going to be some games that are too timing sensitive for us.
Joe Nash
Yeah, absolutely. Yeah. All right. So weird VU timing stuff, right?
Tylo
Yes. So on most CPUs, so I mentioned, like modern CPUs, they can run 8, 10 instructions at once. Right. The way they do this is they kind of like inspect your program pretty much as they're running it. And they're like, oh, hey, you wanted to do this instruction. And then this other instruction. And this instruction is multiplying the values from register 2 and register 3 and putting the result in register 4. This one's multiplying the values from 3 and 5 and putting it in 6. Well, there's no reason I couldn't just run both of those at once. Right. And so it just does. And it just does this across. Like, I think modern CPUs are tracking like hundreds, hundreds of instructions at the same time to figure out which ones are ready to run based on having all of their inputs ready. But the important thing that they try to make sure is that no matter what they do internally, to make developers not go crazy, they act as if they ran each instruction one at a time in order. They only run instructions out of order if they are able to undo everything if something explodes or whatever, to make sure that they act as if they were running each instruction one at a time in order. And from a developer standpoint, that's nice, because I don't have to think about what my CPU is going to be running out of order as I write the instructions for it. If I were writing assembly. Obviously most people aren't writing assembly these days, but the compiler doesn't have to worry about this thing, you know, whatever, everyone's happy. But, you know, there's a lot of circuits being used to track all that stuff, right? So what if you just, you know, didn't have them? So the PS2, they have a pipelined thing that can vector unit that can do operations on four floating point values at once. And so for the most part, they actually do have this kind of tracking on it. And so if you say, right, you're like, oh, let me multiply from the values in registers 2 and 3 and put the result in 4. And then you say instead of 3 and 5 and put the result in 6. What if you said 3 and 4 and put the result in 6? Well, now you have to wait for the first one to finish before the next one can start because it needs the output of the previous one. And so for multiplication on a vu, that does happen, but then you go to division and you're like, okay, let me divide this by this. And there's actually only one division output register Q. And so the result goes in Q. And then what if you read Q and use that for something else? Well, it says, go ahead, you can just read it. The value in there is the one from the division that happened the last division to have actually finished. By the way, divisions take seven cycles. So seven cycles after you start a divide, the new division result will appear in that division register and overwrite the old one. But until then, you can just yoink out the previous value. Hey, why not, right? It saves them some circuitry to track this. And it also means that you no longer. That you kind of have like this extra one extra register's worth of data because you can start the calculation that targets the division and then take the result instead of having to take it first and then having to store it somewhere. And so everyone's happy, except for all of the people who are trying to actually write code and schedule things, because they realize now that they have to kind of like, it makes things painful when you're writing code, because right now you're like, okay, I start my division and then I do some other stuff and then I get my division result, which means I have to have something else to do while I'm waiting for that division, right? So from a programmer standpoint, you can get more performance, but you have to work a lot more on your programmer. But then from an emulator standpoint, we now have to track all of the things that are in flight. Because, of course, the people who are writing the code clearly were tracking in their heads or in a helper program or something how many cycles everything was doing and how many cycles it had left before it appeared in the division result register and overwrote the previous thing. We have to make sure that we do the same thing now, right? And so, yeah, the solution for that, when we do our recompiler for the views we store as a part of the. Like, when you're, you know, normally when you're doing it, you're like, oh, well, I'm going to recompile the code at this address in the code, right? But now you're like, I'm going to recompile at this address with five cycles left on the Q register and, you know, three cycles left on the. There's a, like, sine cosine thing and that has its own output register, right? So three cycles left on that one. And then we will compile a different version of the code if you enter that block with a different number of cycles remaining, the various registers. And so that kind of works out fairly well for the VU only code. But remember how I mentioned that the EE can just execute VU instructions on VU0? Well, as it turns out, we don't have quite that much coordination between our EE recompiler and the VU recompiler. So at the moment that just doesn't work. And the moment our best solution is that we detect when we have something that's able to at least detect when it might be happening. And then we put up a big warning message and say, if your game breaks, you should contact the PC2 devs and they'll make a patch for your game to reorder the instructions. So it won't work on a PS2, but it will work on PCS2. So hopefully at some point we'll get that actually working. But at the moment that's like how things are going.
Joe Nash
Yeah. Actually that brings me around to another question I had, which I guess just about how you work as a team, which is like, I feel like emulators are in this really weird space where your users aren't necessarily. They don't necessarily understand how console, not necessarily technical, they just want to play PlayStation 2 games, they're kind of treating you as a product, but all of it, it's a very, you know, technically intense, case by case basis. How do you, I guess go about thinking about progress of the emulator as a team? Is it a like, you know, you're very driven by what game issues are being reported. Is it overall accuracy to the platform? Like, how do you approach this?
Tylo
Yeah, one of the big things about this, right, is it's a lot of people are working on this because they enjoy working on it. Right. And not because they're being paid to work on it or something. In fact, a lot of the PCSX2 devs specifically are explicitly don't want to accept money because they don't want to feel like they have to, you know.
Joe Nash
Becomes a different thing when you're paid for it.
Tylo
Yeah, they don't want the responsibility that comes with that. So. Yeah, so part of it is kind of just like a. People work on the things that they're interested in working on and so some people care more or less about making sure that bug report comes up and making sure that it gets. We do have a bunch of people who are at least like, you know, triaging those for us. But like trying to choose which actual games to prioritize, trying to make them work is kind of up to each developer. Cool of like, you know, what games they care about. There are this bigger issue, you know, how many games are affected by that. That's kind of like each person will decide for themselves how they want to do that. So yeah, we communicate in a, you know, in a discord server. We know each other pretty well or at least as well as we would as a bunch of people on the Internet. Right. And I feel like we usually get along pretty well and so that's at least good.
Joe Nash
Yeah, that makes a lot of sense. Yeah, I guess. Yeah. I think it's particularly interesting with PCSX2 because I think the project website makes it look like such a, like a professionally run project.
Tylo
Right.
Joe Nash
It's always fun to hear how the open source projects actually happen behind the scenes.
Tylo
Even that, that Website's like pretty new these and up to like two years ago. It was a website that looked kind of like it came out in 2005, probably because it did. So yeah, we have one of the. I don't actually know if they're newer on the team. I think they've been on the team longer than I am, which means I don't know how actually. I don't think they were someone who worked too much on the emulator, but they were like, hey, you know what, I want to work on the website. And they made this fancy new website for us and now our website looks at least somewhat modern. Actually, it's very modern modern. It looks very nice.
Joe Nash
Do you have, along the way, aside from the core emulator, is there like a suite of tools that y'all have built out internally to I guess debug or work towards progress the emulator?
Tylo
Yeah, I think each person kind of used their different things. So since I'm on macOS, there's really only one. I guess I should mention this is from the perspective of debugging graphics stuff specifically. So Apple has general graphics debugger for their Metal API and so I often use that. And then I know a lot of people on the Windows side will use a program called RenderDoc, which does a very similar thing that can be used with any game that's using for render doc. Vulkan, DirectX, 11. Yeah, 11 or 12. It kind of just like you capture a frame, it takes all of the draw calls that made up that frame and kind of puts them in a list and then you can kind of just go through them and look at each one in turn and be like, oh, here's a draw call that did this. But PS2 games tend to be kind of unruly in how they do this because of the way the PlayStation 2 works. And so another alternative we have is we have a draw dumping system where you can check a bunch of boxes in the emulator and it will, for every single draw the game does, it will create a new PNG in a folder and it'll just number them. And in fact it'll make not just a single PNG, but it'll I think make like seven PNGs and maybe two text files saying stuff about the draw because, you know, it's the kind of stuff that you would see in Render Doc, except for all of the screens from Render Doc is pngs and text files and it just fills your folder with like gigabytes of pngs and then you scroll through them with the like icon view of, of your file browser and go through. So I've used that a bunch as well. It's useful for like tracking down lower level things of what the game. Right. It's like the question is like is the game doing something really weird or Oops, I did a dumb thing in my graphics code. And if it's the xcode or Render Doc is better for the Oops, I did a dumb thing in my graphics code. Whereas if the game is doing something especially weird, it's often easier to use the draw dumping because that's closer to the PS2 side of things. And so different people will use different amounts of like draw dumping versus the third party tools like Render Doc.
Joe Nash
Yeah, I'd not heard of Render Doc. That looks awesome.
Tylo
Yeah, Render Doc is very cool. And then from a CPU perspective I think each, each person who is kind of working on like, especially with like the recompilers and stuff would kind of do their own thing which for better or for worse I think they not most of them didn't actually make their way into the emulators like the things for debugging the stuff some of it did, but not all of it. Yeah, so I know we have some integration with like some of the intel profiler tools as well. So like we have a thing where like if you flip on a compiler switch that'll like try to send information about the recompiled like we emulate the CPUs by taking the instructions for that like the MIPS instructions and generating equivalent for the computer it's running on, which tends to, you know, confuse tools and so the tools like profilers and things and so we have a thing to call a special method on Intel's profiler and tell it. Oh by the way, we just made a new function over here. Here's its name and that way the profiler can actually name it properly and make things look nice. Nice.
Joe Nash
Awesome. Very cool. Final question, is there anything coming out for them or anything you're working on for them that you're excited about or anything you're able to tell us is coming up in the future?
Tylo
I think more recently I've been trying to do some stuff with the interface and like we just got the ability to. Well I guess we always had it but it's actually working well to translate the UI into lots of different languages now. But there are a few things that are currently missing from some of the on screen stuff. Doesn't work very well with like Arabic or stuff. So that's definitely something I would like to work on is trying to get that to be able to show Japanese and Arabic characters in the on screen ui, which we currently can't.
Joe Nash
Yeah, I imagine Japanese, especially for a Sony console, will be very impactful. It's awesome. Cool. Well, Tello, thank you so much for joining me today. What a piece of hardware. Thank you for running me for everything. Yeah, thank you for working on such a wonderful emulator.
Tylo
It was a pleasure doing this interview.
Podcast Summary: Software Engineering Daily – Episode on PlayStation 2 Emulation with TellowKrinkle
Introduction
In the November 13, 2024 episode of Software Engineering Daily, host Joe Nash interviews Tylo, a prominent developer for the open-source PlayStation 2 emulator PCSX2. Tylo, also known by his alias TellowKrinkle, has made significant contributions to emulation, including porting PCSX2 to macOS and working on the Dolphin emulator for Nintendo GameCube and Wii. The discussion delves deep into Tylo's journey into emulation, the intricate architecture of the PlayStation 2, the challenges of rendering PS2 games on modern hardware, and the collaborative nature of open-source projects.
Journey into Emulation
Getting Started
Tylo's entry into emulator development was unconventional. He shares:
"I looked up a lot to emulators and emulator development when I was younger... after graduating college I took a part-time job for a while and I had some extra time and I was like ah, maybe I can work on an emulator."
[01:44] Tylo
Initially inspired by the Dolphin emulator, Tylo shifted his focus to PCSX2 due to the lack of macOS support, which was his primary operating system. His early efforts involved resurrecting a legacy macOS interface:
"We got some Windows working. And from there, there were two things that mainly stood out... native support for macOS."
[02:54] Tylo
Contributing to Open Source
Tylo's approach exemplifies the open-source ethos—dive into existing codebases, understand legacy components, and incrementally build upon them. His relentless pursuit to restore missing features, despite limited graphics programming experience at the outset, showcases the iterative nature of emulator development.
Understanding the PlayStation 2 Architecture
Complex System Design
Tylo provides a comprehensive overview of the PS2's architecture:
"The PlayStation 2 it's like three or four processors all kind of connected via a DMA a memory copy engine... the main CPU which is known as the EE or emotion engine."
[06:41] Tylo
The PS2 comprises:
Comparison to Modern Systems
Contrasting with contemporary GPUs, the PS2's graphics processing is more fragmented, with discrete units handling tasks now typically managed within a unified GPU architecture:
"So the vertex shaders are much closer to the CPU... that just wouldn't work."
[10:29] Tylo
Challenges in Emulating the PS2
Graphics Rendering
Emulating the GS and replicating the PS2's graphics pipeline on modern GPUs posed significant hurdles. Tylo highlights the complexities:
"We have to track cycle timing within a single processor... because they actually kind of reveal some of their internal timings to the programs."
[11:20] Tylo
Certain games exploit precise cycle timing between the EE and VUs, making accurate emulation exceedingly difficult. These nuances require meticulous synchronization to ensure games run as intended.
Floating Point Arithmetic Discrepancies
A critical issue arises from the PS2's non-IEEE standard floating-point implementation:
"On the PlayStation 2... the floating point math just like completely ignores what's now standardized as how to represent a 32-bit floating point value."
[12:25] Tylo
Differences include:
Notable Quote:
"You've got to handle these floating point differences or else the AI breaks."
[15:28] Tylo
Emulation Accuracy and Performance
Balancing Act
Achieving near-original console performance while maintaining accuracy is a delicate balance. Tylo discusses various rendering modes:
"PCSX2 just has, if you look through I think the advanced settings or something, there's a number of like clamping modes."
[16:52] Tylo
These modes clamp floating-point values to mitigate inaccuracies but may not resolve issues for all games. Additionally, their software renderer offers higher accuracy at the cost of performance, demonstrating the trade-offs inherent in emulation.
Graphics Synthesizer (GS) Enhancements
Tylo's work on the GS, particularly the implementation of a Metal renderer for macOS, underscores the ongoing efforts to improve rendering fidelity:
"We switched off of [geometry shaders]... it's a very Apple thing, they just were like, no Geometry Shaders."
[25:57] Tylo
This adaptation required innovative solutions to emulate PS2's graphics effects without relying on deprecated modern API features.
Blending Accuracy
A significant challenge was replicating the PS2's blending behavior on different GPU architectures. Tylo explains the complexity:
"With SMAA, you need to do here, and the blend unit is now in charge and we can't really do much about it."
[33:42] Tylo
Different GPUs (AMD, Intel, Nvidia) handle blending uniquely, necessitating tailored solutions to achieve accurate visual effects across platforms.
Team Dynamics and Development Workflow
Collaborative Open-Source Efforts
PCSX2's development thrives on a passionate, volunteer-driven team. Tylo notes:
"People work on the things that they're interested in working on... It's up to each developer."
[46:17] Tylo
Prioritization of game-specific issues varies, with developers focusing on the games they care about most. Communication primarily occurs through platforms like Discord, fostering a collaborative environment despite the decentralized structure.
Tooling and Debugging
Effective debugging tools are essential for pinpointing emulator issues. Tylo mentions tools like RenderDoc and custom draw-dumping systems:
"RenderDoc is very cool... within the emulator we have to use our own draw dumping."
[47:43] Tylo
These tools help developers visualize and troubleshoot rendering problems, enabling more precise emulation of the PS2's complex systems.
Future Developments and Enhancements
Interface Localization
One of Tylo's upcoming projects involves enhancing the emulator's user interface to support multiple languages, including Japanese and Arabic:
"Trying to get that to be able to show Japanese and Arabic characters in the on-screen UI, which we currently can't."
[50:48] Tylo
This effort aims to broaden PCSX2's accessibility, catering to a more diverse user base.
Ongoing Challenges
Despite significant progress, certain emulation aspects remain unresolved, particularly regarding cycle-accurate synchronization between the EE and VUs. Tylo remains optimistic yet realistic about the timeline:
"It might not happen in 10 years, but there's always going to be some games that are too timing sensitive for us."
[39:53] Tylo
Conclusion
The episode provides an in-depth look into the complexities of PlayStation 2 emulation, highlighting the technical challenges and collaborative spirit driving projects like PCSX2. Tylo's insights into PS2 architecture, floating-point discrepancies, and the nuanced balance between accuracy and performance offer valuable perspectives for both enthusiasts and professionals in the software engineering and emulation communities.
Notable Quotes
Tylo on Starting Emulation:
"I was like, hey look, it would be cool to have a Mac version of that, right?"
[01:44] Tylo
On Floating Point Issues:
"If you multiply by zero on a PlayStation 2, you're guaranteed to get zero back."
[12:18] Tylo
Regarding Blending Challenges:
"So, yeah, the BASIC mode... adds a very small overhead."
[35:15] Tylo
On Team Collaboration:
"We communicate in a, you know, in a discord server. We know each other pretty well."
[46:50] Tylo
Resources Mentioned
Final Thoughts
This episode underscores the dedication and ingenuity required to emulate complex systems like the PlayStation 2. Through Tylo's experiences, listeners gain a deeper appreciation for the intricacies of emulator development and the thriving open-source communities that make retro gaming preservation possible.