
Loading summary
A
Hey, everyone. Welcome to the Latent Space Podcast. This is Alessio, founder of Kernel Labs, and I'm joined by Spiks, founder of Small AI.
B
Hello. Hello.
C
Today we're so excited to be in the studio with Gaukam and Batu Han Fau.
B
Welcome. Yeah, thanks for having us. Long time listener, first time caller.
C
Gurkhum, you and I actually go back a long way to when it was still features and labels and you were just coming out of Amazon. I don't even remember the pitch. I honestly should look at my own notes. But you were optimizing runtimes.
B
Yeah, it was first like we were building a future store and then we took a step back and then we decided to build a Python runtime in the cloud and that evolved into an inference system that evolved into what FAL is today, which is a generative media platform. So we optimize inference for image and video models and audio models, but we do a lot more. We try to own this whole generative media space for developers, basically.
C
Yeah. Amazing. And we can talk about that journey. I wanted to also introduce Patuhan. We're newer to each other, but you've come to some of my meetups before. You're head of engineering?
D
Yeah, I lead engineering here at fal, you know, glad to be here.
C
And what's your journey?
D
I met burka in 2021 when they were just starting a company and like just before the seed round, you know, Burqa and Gyrka, we met online. We're all both Turkish. I think that was the connection. We just met and then they said, o, why don't you join us? And I was one of the core developers of Python language, so I had really good experience with developer tools around the Python language. So I started coming here to build the Python cloud, which evolved into this inference engine and the generative media cloud that we are building today.
C
And now you spend time, less time with Python and more time with, I don't know, Cuda custom kernels.
D
Exactly.
A
Yeah, yeah, I remember the DBT file when the modern data stack was out. Can you guys maybe just give a quick sense of the scale of file? So you just raised a $125 million. Seriously, we can talk about how I passed on one of your early rounds. We can go through that. How many developers, how many models do you serve? And maybe any other cool numbers?
B
Yeah, we have around 2 million developers on the platform and for the longest time we required GitHub login. It recently changed, but. So I'm assuming everyone who has a GitHub account. As a developer, we have around 350 models in the platform. These are mostly image, video and AUD models. It used to be only image and then we added audio and the space evolved into video as well. And yeah, that's pretty much the scale. We just closed announced our series C round and we've been growing a lot in the past year and it still continues.
A
Yeah, you had a very nice CRC party and you guys are over 100 million in revenue, right? Just. This is not just developers kind of kicking the tires.
B
That's correct. Yeah.
A
That's great. When you say 350 models, I think what percentage of all the models that you could serve is that? Because you know, especially in an infinite.
D
Amount of post trained versions of these models, we are trying to serve the models that fix a gap, you know, that fill a gap in the stack. So we don't add a model that's like significantly worse in any aspect compared to other models that we have. We are trying to bring unique models that solve a customer's needs. So that's like these are 350 models, you know, there's like 20, 30 text image models but like one of them excels in logo generation and the other one excels in human face generation. So like every model has a unique personality, but if a model is significantly worse in all aspects, we don't add that to the platform. So there's like infinite amount of models that we can add.
C
And do you rely on your own evals or just what the community?
D
We mainly rely on our own evals as well as we are in the community. So we also follow the community very well to see what is going to be the thing that's going to be in the next generation of apps. So if you think something like we have a good intuition, if you think something is going to pop up, we just add it.
C
To my knowledge, you haven't published your own evals, right?
D
No, we don't publish internal.
C
And then the community is Reddit, Twitter.
D
Twitter, Reddit, you know, hugging face, Seeing how popular the models are on hugging face and other demos.
C
Okay. I just want to give people a sense of where to get this into.
B
The best part of the job is the day of a model release, the adrenaline rush that comes with it. The whole team trying to scramble something together and release it. And it happens every week. Every week is exciting.
A
Can we do maybe a brief history of like the models that were like the biggest spikes maybe in usage? You know, you cannot, I think everybody knows stable Diffusion, diffusion. And then you have maybe like the Flux models and then you have Black Forest Labs. You have all these different history wise.
D
I think the biggest, the initial hit was stable division 1.5, which is when we actually pivoted into this new paradigm of fall generative media cloud. We started hosting it. We noticed like we had the serverless runtime and everyone was running the stable division 1.5 by themselves. And we noticed it's terrible for utilization and they are not optimizing it. So let's just offer an optimized version of this that's ready for API to be scaled and doesn't require people to deploy Python code because we want product engineers to start using it, we want mobile engineers to start using it. So we started offering A Stable Division 1.5 was very popular. The fine tunes around it was very popular. Stable diffusion 2.1 came. It was a bit of a flop, so it didn't like got that much attention. And then SDXL came, which was the first major model that brought our first million in revenue if we consider that. And with SDXL obviously the small fine tuning ecosystem also exploded. People started fine tuning their faces, their objects, whatever. And generations with this, like Loras started to become very popular. And then after Stable Diffusion xl there was like a bit of a quietness around it, you know, SD3, there was like some drama around it. And the team at stability left to start Black Forest Labs which released Flux models. And that was the first model to reach the barrier of commercially usable, you know, enterprise ready grade models. Where in the first month of Flux Models we reached from like 2 million to 10 million in revenue was like a big jump. Next month we were at 20, like it just started going from there. And then video model started came around. You know, we partnered with Luma Labs, we partnered with other video model companies in China, we partnered with Clink, Kuaishou, Minimax and with these models like, you know, it created another market segment that was a big jump. And this, the final biggest thing was VO3 where it was actually created this usable text to video component. Where before text to video was a very boring soundless video. You would not get enjoyment out. Whereas now it's such a great experience. You can create all these memes that you're seeing online, all these ads. So that was another big jump for us. Partnering with Google DeepMind 4v03.
C
Yeah, actually that's a really good history of Generative media that Soundbite. So I wanted to double click on that because obviously we can dive I think everyone's interested in video, but there's a whole history of the image side I wanted to cover first. Just definitely wanted to start with was just the decision to pivot. I think I just want to double click on that. It's not a trivial decision, but obviously the right one. At the time I would say a lot of people were hosting stable diffusion, right. So it wasn't obvious that you can just build an entire company around effectively just specializing in diffusion and inference. What gave you the confidence? What were the debates back and forth?
B
Yeah, couple decisions we had to make there. We could have evolved the company into more towards GPU orchestration and essentially we had this Python runtime. We were running it on top of GPUs, like that could have been the company, but we saw every single person, every single company who are using what we had like a little SDK to run Python code on GPUs, they were doing the same thing. They were deploying a stable diffusion application, maybe using some Loras on top of it, different versions of it, inpainting, outpainting, things like that. I mean it was very wasteful. We decided, okay, this needs to be an API where you actually optimize the inference process and everyone benefit from it and like you can run it multi tenant, you know, the utilization is much higher then. So that was the decision number one. And then obviously after stable diffusion, I think like four or five months later llama 2 came out and there was a decision point again.
C
You could do language models.
B
Exactly. And a lot of the inference providers at the time, there were maybe a couple of them and they all went all in on language models. And we decided language models, hosting language models is not a good business. At the time we thought, okay, we are going to be competing against OpenAI and anthropic. And all these labs turned out that it was even worse because the killer application of language models is search and you are competing against Google at the end. And Google can basically give this for free if they can, because it's so important for them and it threatens their business right away. And with Imogen Media Models it was a net new market. We weren't going against any incumbent, we weren't trying to get market share from someone much bigger than us. And we liked that aspect of it. We thought we could be a leader here. It was a niche market, but it was very fast growing. So we chose to be a leader or to be a leader in this fast growing niche market rather than trying to go against Google or OpenAI or Anthropic. So that was the decision we made and turns out it's a good one because we are able to define the market we are in and educate the people and grow with it. And so far it's been growing fast enough that we were able to build a whole company around it. Yeah.
C
And I think you noted at AIE that now there's a generative media track in the generative media specialist.
B
We're calling it generative media by the way.
C
Yeah, I mean obviously it's a thing and people care about it and I do think it's going to change the economy. And as a creative person I think I also wonder what's going to do for us. Just so I want to keep it technical and keep thinking about the pivot because I think it's still one of the most interesting pivots I've seen in the AI era. You were not CUDA kernel specialist at the time, Right?
D
I come from a compilers background so my job was optimizing bytecode interpreter to make stuff faster, which is performance engineering. And yes, I don't think at the time there was that many CUDA kernel specialists either. So it's like we were at the right time. It was like actually the space was actually so, so much worse than what we have today where the running Basic stable diffusion 1.5 was like a unit with convolutions and the convolution performance on environments was like you're getting like 30% of the GPU power if you just use rough touch because no one cared about it. So there was like so many low hanging fruits that we started to pick up and start optimizing and it kind of evolved, evolved, evolved. Right now it's like much more competitive space with like Nvidia has like a 50%, 100% kernel team that's writing kernels. You're competing against that at the time no one really cared about. So it was like a good new field for us to go thrive and.
C
There'S no community effort like a vla.
B
Exactly. When these models were first released, like no one in the world has ran them in production, like it just didn't exist.
C
It's like a research output.
B
Exactly. Yeah. You had your maybe local gpu, maybe you had like a single GPU that you rented for from the cloud and basically this was a research interest rather than a product interest. And no one at Meta, no one at Google had run this in production. So we also thought this is a good time to start a company around this and actually Spend time optimizing it as much as we can, because if we can get millions of people to use this, there's a lot of economical value to be created there.
A
Can you talk a bit about how much of a performance boost you got? Because I know when I met you guys, you were about a million in revenue. You were like, wow, we're writing all these custom kernels and maybe part of it is like, okay, how many kernels can you actually write as you support all these different models? What's kind of like the breadth of them? Are you writing kernels that you can reuse across models? How much work do you have to do on a per model basis?
D
It really evolved in the past three years. When we first started, there was a single model, stable diffusion 1.5. So all of our kernel efforts were how do we make stable diffusion 1.5 as fast as possible? You go from 10 seconds with Pytorch. At the time there was not even like a torch compile, torch inductor or whatever. So you were going from 10 seconds to maybe like 2 seconds on the same GPU. And we started with that. The next thing with adding more models like stable diffusion, Excel was a different architecture, Pixart was a different architecture. All these different architectures started coming around. We said, let's build an inference engine, which is what we call a collection of kernels. Parallelization, utilities, diffusion, caching methods, quantization, all that stuff combined into one package. And so we built this inference engine the Same Time Pytorch 2.0 was released with torch inductor and torchdynamo to do torch Compile, which is essentially a way to trace the execution of your neural net and generate writing kernels that are fused, that are more efficient. And I'm a big sucker for Just in Time compilers. I used to work at pypy, like a Just in Time compiler for Python. And we said, this is a great idea, let's apply this. But a more specialized, more vertical way for diffusion models. At the time it was unets. Now it's diffusion transformers, which are significantly different than your autoregressive transformers in terms of the profiles of how compute bound it is what sort of the kernels are taking the majority of the time if they are doing bidirectional attention or causal attention. So we started doing that and now what we have today is an inference engine that's applicable, that gets you 70, 80% of majority of the models on diffusion transformers. And we still have a lot of custom kernels for a lot of models to squeeze out because they're still small. Every model wants to make an architectural difference. You guys see this on like, you know, even for stuff like, you know, Qan, Deepseek, whatever people want. Like even if we know an architecture is the best, they want to tweak it a little bit just to make sure, oh, we are releasing something cool. So we saw this and then for that we have to write like custom kernels for custom RMS norms that people are doing or whatever, like stuff like that. So we have a decent amount of kernels, like over 100 of custom kernels. This doesn't include the auto generated ones. You know, we have templates of kernels that generates like you know, for thousands of different shapes, problems, spaces, whatever. But like if you consider those, you know, like we have tens of thousands of kernels obviously at runtime that we are running and dispatching, but that's pretty much like the depth and breadth of it.
A
And on average a model on file runs 10x faster than I would self hosted. Like if I just take stable diffusion, right?
D
And I put it, I know this might be a bigger discussion point. Do we consider speed as a mod? This comes to that the existing open source industry evolves so fast where, you know, like if you go to like this might have been true three years ago now Py is like already like very, very good for H1s, right? What about P2 hundreds? When you use PyTorch with P2 hundreds Blackwell chips, you're not getting the best performance. So our main objective and our main goal is for whatever GPU type you're using these diffusion models, we're going to extract the best performance at any point in time. It could be 1.5x, it could be 3x, it could be 5x. For certain models, it could be 10x. It would be a bit of an unfair thing to say, oh, we're going to make everything magically 10x faster. No one in the world can do that.
B
We are lucky that this is a moving target and open source community. Everyone catches up, but at the same time new chips come out, new architectures are released. So we are always ahead of what's possible. But then they catch up. But we have to stay ahead of it and that's how we can create differentiation. Because it's a moving target, because there's so much going on. Whenever something new comes up, we are the first one to optimize it, first one to adapt our inference engine to it. So at that time the fastest place to run it, that helps with margins, things like that. But eventually people do catch up. I think it's very hard to create this differentiation over long term if there is no new architectures, if there's no new chips. But luckily there is all the time.
A
Yeah. And I think with image specifically you cannot stream a response, so to speak. So when you have a language model, it's like you're kind of bound by like how quick, how you can read. So even with like Grok, it's like, it's impressive to show I think in a second, but it's like I'm not reading that fast. Right. So it can go slower.
B
Yeah.
A
Versus with images, it's like you just need to see it. That's why Midjourney now has like the draft mode, for example. It just gives you this like very low quality resolution. Yeah. But at least you can see whether or not it's going in the right direction. How much of that is actually true for your customers? What do they care about the most? Like, is latency that important? Like what's the range of latency that matters?
B
Yeah, latency is really important. One of our customers actually did a very extensive A B test of like they on purposely slow down latency on file to see how it impacts their metrics and it had a huge part in it and it's almost like page load time. When the page loads slower, you make less money. I think Amazon famously did a very big A B test on this. It's very similar when the user asks for an image and iterating on it. If it's slower to create, they're less engaged, they create fewer number of images and things like that.
C
Yeah, it's the same learning that Amazon has like every 10% improvement in speed.
B
Yeah, exactly.
C
The elasticity is high. The other thing I wanted to also dive into, putting a little bit of the investor hat on. One of the reasons for foul success is kind of not within your control, which is when and how people release open models for diffusion, which at the time it was just ability and there was no Chinese output. I mean we did have other image models, but they were not great. So you made a bet on when it just wasn't super obvious. But then the other thing is, which is what you're touching on, the diffusion workload is very different from the language workload. And the language workload is being super optimized, whereas diffusion is not. So you just had kind of no competition for a while, which is fantastic for you, 100%.
B
And the open source, we benefit a lot from it. Obviously. But like in the, in the past six months, a year, we started working with some of the closed source model developers as well. Like behind the scenes helping.
C
But they're not sending you their weights.
D
They do.
B
They are, yeah.
C
What do you have to give like guarantee that?
D
I mean we are any cloud provider, like what do they think in like AWS or Google Cloud or these new clouds, there's like 50 new clouds. Right. Like we are not that different from any other cloud provider. And this is why we packed the inference engine in a way that you know, they can self service and get 80%, 90% of the performance. So they don't even have to show us their code. They deploy to our. We have our own cloud platform where our inference engine is available only in that platform. So they can tap into that when they deploy their code and their model weights to us and we don't really have to look at it if they want to collaborate with us, which some companies did in the past where we would just essentially we have performance engineers acting as forward deployed engineers on their behalf and writing custom kernels for them.
C
Okay, have you disclosed who you're doing this for?
D
We disclosed playhd Play AI. That was one of those. We have like four different companies for major video companies that we are doing this with and one image company that I don't think we disclosed.
B
Yeah, as you can imagine it's a little sensitive for them.
C
So yeah, I would say replicate started serving VO3 models and we were like okay, are you just repping their APIs or something? And I think so. It's not obvious how much integration there is going on and how much that's on your infra or your tech.
B
Just to be honest, some of that is happening too. VO3, I think everyone, it's just the API wrapper.
D
Yeah. You have a dedicated pool that you can serve to your customers with different speed. SLA guarantees, whatever, how it would work.
C
For something like VO3 but your objective is to be one stop shop. But then also you can do inference better than some of these other places.
D
Google is obviously hard, but with other vendors our goal is helping them run inference because these are research labs that doesn't necessarily invest heavily on inference optimizations. Scaling up infrastructure, that's like another challenge that we can talk about at launch days. Some of these models in their website, they just explode and file API is working fine because they deploy them. You can scale up to thousands of GPUs instantly. So there is that aspect too. When we pitch this value prop as well as a Distribution that we bring to them, it's a no brainer for them to just deploy their model to file and use it for both the file marketplace as well as their own distribution channels.
C
Yeah, a couple of follow up questions just on playhd just because you mentioned it, Music audio, is that a different workload than normal diffusion or is that the same?
D
I can't really comment on their architecture, but some of them are autoregressive models in the open source world, some of them are autoregressive, some of them are diffusion based. So there's notorious ones that's known for diffusion, as you guys can guess, one of the biggest companies. So it's similar workloads but at the end of the day our performance, our inference engine is very versatile and our performance seems very versatile. With PlayHD we had very deep collaborations where we had three engineers at some point helping them optimize their inference process as well as infrastructure to get them ADMs end to end. Time to first audio chunk, which is a very impressive thing for real time text to speech workloads.
C
And then the other known hard problem is serverless GPUs which is a thing that everyone has chased a lot and many people have failed. What can you say about what you've done there to make it happen? So for example, Modal has been talking a lot about their GPU snapshotting, but I imagine it's like a stack of technologies in order to achieve the scaling.
D
It's a stack of technologies. The biggest problem with serverless GPUs is are you just wrapping another. Like if you have a Kubernetes deployment, are you just wrapping it and giving people access? Or are you actually multi cloud? Do you manage your own orchestration chain? Do you manage your own container runtime? Do you manage all this stack? And in our case we started with the Kubernetes version when we were just doing it for ourselves. And Kubernetes version at Google Cloud was fine in 2022 when we wanted to get eight A1 hundreds, but when we wanted to go to like thousands of H100, it's not going to work. It's a terrible position to be bound by a single cloud. So right now we work with six cloud providers and we have 24 different data centers in four different countries and we now do long term data center releases as well. To manage some of the hardware chain ourselves in this world, like we had to build our own orchestration layer, we had to build our own distributed file system, we had to build our own Container runtimes, all the stack to make sure that the cold starts are extremely, extremely fast. Which is one of the things when you're scaling up as well as handle actual scale. Where we are managing over 10,000 plus H100 ecleans today. Yeah.
C
And a CDN for caching.
D
CDN that's outside of the serverless infrastructure. But CDN content moderation systems. All of these consists the platform. There's so much.
C
You also do moderation.
D
We also offer content moderation services to the foundation of all companies for them to moderate their inputs and outputs.
C
I see, I see. As a separate product.
D
Yes.
A
From a GPU perspective, do you always need to be on the latest? You keep mentioning H100.
D
Majority of our workloads are in H1 hundreds because price per price it makes sense. But Blackwell is obvious. We have five people dedicated to writing Blackwell kernels right now to make sure we can. Because theoretically it looks good. Flops dollar wise it makes sense. But can you reach the actual flops?
A
No.
D
So we have a dedicated team that's like working with Nvidia directly to write custom kernels for Blackwell for diffusion transformers to get to the point where it makes per dollar make sense. And then we would start with our own workloads as well as some of our foundational companies. We would ask them, oh, if you want to migrate to Blackware, here's an inference stack that already works.
B
We are at that point where we should be the ones pushing the boundaries on Blackwells because no one else is doing this work. And maybe it doesn't make sense economically right now price perf wise, but we know it can. So we are working towards maybe like a couple of months away from that point. And then whenever it does, we'll probably switch as many workloads to Blackwells as.
C
Possible just to be super crazy. When does it make sense to just work on an asic?
D
I don't think it does. That's like honest opinion. This is like one of the most controversial topics, right? Is all these asics. Great idea. If you're like sram, if you're memory bandwidth bound and you can put all of sram, is it even economically viable at that point? I don't know. But the summits around these chip designs you see, okay, what is the overhead of an Nvidia game instruction? Right. It's like 16%. So you're essentially buying a metrics multiplication machine. So it doesn't really make sense to specialize it that much. And some of the B3 hundreds are going to have A better softmax instruction that gets like 1.5x whatever. And that might be one way where Nvidia gets better performance out of the majority of for the majority of workloads which is like attention heavy stuff. I think it might make sense for Nvidia to add more specialized stuff but for us I don't think it will ever make sense to build ASICs.
C
Just thinking about from first principles that the diffusion workload is very different. But also obviously there's still a lot of changes in the architecture that you need to just general purpose.
B
We don't have a single model where we are trying to optimize. We are trying to do it for the newest the best like always. The flexibility is therefore really important.
C
I was going to show I'm going to pull up the Quinn mmdit where there's like this dual streaming thing which I last I think SD3 had it.
D
Yes. SD3 flux.
C
Yeah. Is that the standard model now?
D
Mmdit so it's also a controversial topic. You know, scaling rectified flow of transformers paper, the SC3 paper came up with this architecture and then one of our research team actually like Simo Raio, he's like our head of research, he found out that just using MMDITs are inefficient, you need to mix them. And now there's like controversial opinions. You know like movie jam paper were saying oh mmdit is completely unnecessary. You can just use it single stream, dit whatever. So like there's like controversial opinions happening in terms of architecture changes which I understand because everyone wants to do a different architecture. No one wants to do the same architecture because it's lame otherwise like it's just a matter of compute and data. And these researchers don't feel proud that their model is an output of data and compute. They want to make a novel research change. So I think the architectures is going to keep changing until this paradigm of researchers changing stuff for the sake of changing things.
C
I'll talk about a couple other architectural things just to keep it bounded within this topic. The distillation was a thing for a while. SDXL Lightning. You guys did fantastic demos of Tail Draw, which we've also had it on podcast. Fantastic episode. What happened to those things? How come they're not popular anymore?
B
I think it makes for a good demo. You could build real time applications, you could build these drawing applications, things like that. But I don't think people could build applications that have user retention long term. People couldn't really build useful things with it.
C
Let me play out what I thought was going to happen and then you tell me why it didn't happen, which is consistency models for drafting. It's like you use your hand to draw things and it creates the draft. Then you upscale.
D
Right.
C
With a real model, but that's it. Why can't it be a two stage process instead of one stage?
B
Yeah. And I think one thing that happened is flux. That generation of models were not good at image to image when it first came out. So you need that good image to image model to be able to draw. Maybe it needs to be revisited around this time with some of the editing models.
D
Maybe like image to image and control nuts. Flux like SDXLR control nodes were very popular where people were used to do this stuff like sketch the image, whatever. And with Fluxer I think people cared less about it. One thing that I keep thinking about this is like, is this true for LLMs too? I always default to cloud 4.1 opus. Right. Even if it's slower than Sonnet, it's just like I know I'm going to get the best quality.
C
Exactly. That's what's happening here.
D
Yeah, it seems like what's happening here as well as a.
C
Okay. Anyway, as a creator, I want fast, quick drafts and then I can refine. Right. So I don't know why it didn't happen.
B
More true for video models. Right. It used to be like five minutes, four minutes for a single five second generation. Now it's mostly under a minute, but you want 10 second, five second generation. And then because the workflows of creatives, when they're working with it, they generate a ton of videos and then pick one and then create a story around it. So when you watch these people actually generate videos, they generate hundreds at a time and they have to kind of sit around and wait and then iterate on it. The faster speeds mean a lot for creators.
C
Yeah, it does. The other thing I wanted to also briefly touch on before we go back to the main topics is the autoregressive models which you mentioned.
D
Right.
C
Honestly, I still think Gemini is underrated because they were first but then obviously openly I did the 4.0image gen and that was a huge thing. I actually even wonder if there was a panic for you guys because obviously this is soda image gen and no one else has it. It's not open source.
D
You passed through those era so many times.
B
Stopped worrying about it.
D
Garcia's like good stories around Dolly.
B
Yeah, I mean I talk about this stuff like when the Elite 2 first came out, I was like, okay, OpenAI is so far ahead of anyone else, it's imposs for midjourney and then people caught up within months and then stable diffusion was even maybe better or just as good as dall e, like a couple of months later and it was open source. So like a year later, same thing happened with Sora, they put out those videos and that time around I think we were excited because now that people see that it's possible, like this is actually doable, researchers get motivated and they see the hype, they see that this is possible, so they work on it. And within a couple months we had maybe not SORA level, but much better video models. Now we have video models that are much better than Sora. So whenever we see someone actually pushing the frontier, it's a reason for excitement because now that's possible. Other people are just going to do it within a couple months, so we don't panic anymore.
A
Is the fact that Anthropic doesn't have an image generation model tell you anything about what the larger labs care about?
D
It tells more about Anthropic's own personality than in general. What other? Because if you look at Xai, if you look at Meta, if you look at OpenAI, if you look at Google, they all have really good image models.
B
Yeah, Google, in their last announcement they used the word generative media, by the way, which was a proud moment for us and a lot. And they focused on generative media as much as their new LLM model. So some labs definitely care about it and some labs, it's not a priority for us.
D
Look at xai, they keep pushing like.
A
In between AI slob. Yeah, I know, I know, it's crazy. And waifus and levels of interactivity. You have images, you have video, now you have Genie, this kind of like more world model. You have kind of like gaming applications of that. How far are we from like foul getting a lot of traffic on those models? Like is it mostly experimental today in open source? Obviously Genie is impressive, but like it's a Google model.
B
I have a very optimistic take on this and then maybe like a normal outcome. I think at worst we are going to have very capable video models that come out of world models. Right. It's going to be a very controllable video model and the use cases will be similar to what video models are. You're going to create content, but you're able to control the camera angles. You're able to control the video model a lot better than what you can do today. At worst we are going to get that from world models. And at best, I think it's very hard for anyone to predict what's going to happen. Yeah, movies and games, it's going to be something in the middle where you can be part of a whole movie universe that's going to be playable. So it's boundless possibilities. What's going to happen at best and how affordable is it going to be? Is this ever going to reach mainstream adoption? We'll see all that, but it's definitely technically incredibly exciting and impressive what's coming out of these labs.
A
Yeah, I need to find the paper again. But there was this study on video models and image generation, understanding physics and it could predict the orbit of a planet. But then when I actually had it draw out the gravitational forces, it was completely wrong. And so I think that's my thing with world models. I understand the creator application, which is how you can create consistent world. But I don't know if like the other side of people that are like, hey, these are like the best way to like simulate the world and like get intelligence and things like that.
B
I don't know, optimism around it too. Because whenever you talk to someone who's working on robotics, they're bottlenecked by the amount of data they have. And from all these past three years of AI innovation, we've seen that whenever there is an abundance of data, that type of models actually like improve a lot. And you see. So like robotics, we expect something similar. Whenever they figure out this data problem, those models are going to get better as well. So that's why people are so optimistic. Okay, maybe this solves the robotics data problem and it's boundless opportunities there.
D
And regarding the example you mentioned about gravitational forces, I think this is still the same problem as oh, LLM scan to 9.9 plus 9 point. Yes, it can. You just need to train it with more data. You need to have a better tokenizer. It is a reason, whatever. It's just a matter of data scale and the underlying fundamental architectures. But I don't think it's going to change that much. We just going to put thousand x more data, thousand x more compute and we'll get the best physics simulators. And I think this should be possible with existing signals coming from the data.
C
Just to double click on video stuff as well. You had a great slide at AIE where you're like currently 18% of files revenue comes from video models. And it might be a.
B
That was February. So now, now it's probably over 80% 50. Okay.
C
It's like over 50?
B
Yeah, yeah, 100%.
D
Wow.
C
Okay.
B
I guess editing models brought some life into the image as well. So like both of them grew. But yeah, video grew faster for sure.
D
Video is like pretty, pretty significant. And one of the main drivers is open source models where, you know, in February there was Hunyuan video, I think that was pretty good. There was Mochi from genmo, but the quality still wasn't there. And one from Alibaba was insanely good model. And they released a newer version of this a month ago I think or a couple of weeks ago. And now it's getting so, so popular. And we can run this model for 480p, the draft mode version, we can run it 5 seconds, under 5 seconds. So people can have instant feedback loop. And then when they want to go to 720p, full resolution is just like 20 seconds. And we're planning to bring it down to 10 seconds.
C
Yeah, that's amazing. And I want to double click on that. For a while I was kind of bearish on Alibaba because they kept releasing papers with very cherry picked and it was like, okay, we're on GitHub and then you go to GitHub. It's a README.
B
I mean you can see something really change. They've been using new image models.
D
No, we haven't talked to them, but it seems like. And now there's competing teams inside Alibo. One is a really good image model, but they released Kwen as a competing Kuen's image model. We think one's image model is actually very, very good. If you do one with single frame instead of 81 frames or whatever, you get a really good text image model out of it. And this is just because of the pure amount of data that you put from the videos. So now there's like Alibaba has two of the really good models from their lab and then there's smaller labs in China that you might not hear about. But Stepfun, they released an image editing model, high dream v1. There's all these small labs they're releasing because I don't think training these image or editing models are that expensive and video models might be slightly more expensive. My guess is training these cost like a couple million dollars, which is not that much especially they're probably backed by some sort of entity other than Alibaba. There's Step fund, whatever. They probably really good amount of money. So training these models will bring you a lot of attention and it's more attention that you would get releasing a subparallel because like LLM Space has so much more competition. So just training this for like a million dollars, a video model and then releasing it, I think that brings you a lot of attention.
B
It is a hack. When you look at hugging face. Let's look right now I'm sure the top models are image models. Like it's coin image editing probably is probably up there. Probably up there.
A
Number one. Number one Hunion Gamecraft is number two. Number four then you get Gemma 270.
C
Million some image stuff.
D
Bytedance hasn't open sourced but they have a really good team Seed that's their new lab they're working on seedream, Seedance, Omni Human like stuff like that. We have a good partnership going with them to hopefully have their models hosted in us as well. And the idea is I think that the team that they were able to assemble is very good and it's coming from their previous researchers, whatever. ByteDance was doing really good open source stuff on STXL Lightning. They released SXL Lightning paper animative Lightning. So I'm. I'm pretty hopeful about them.
C
Yeah first of all hopefully they don't reach out to you when they launch, they just drop and then you.
D
At this point people reach out to us because we are the market leader so they just reach out to us for getting distribution.
C
Yeah there's a Chinese platform that they always launch on first which I forget the name of it but you have to.
D
We also get day zero launch with majority of these models.
C
So basically I think the question is always like you're the ones making money. Stability did not make money from stable diffusion.
D
I think the thing that Black Forest Labs did was very, very interesting in this aspect. They released three different models. Apache 2 licensed extremely distilled model which is good for dev Chanel. This is the Chanel version. This is like the four step generations for lower quality stuff they released a dev model with a non commercial license where their inference partners are like you're paying a revenue share with like and this is like very good way and then there is a pro version where you can like collaborate for hosting it. And I like the revenue is obviously different for that as well. This is I think a very smart choice for labs whose whole premise is releasing models. But if you're a company that is doing a product in the side you don't necessarily need to make money from the open source models. You're doing it for getting researchers, you know, hiring people, getting distribution, whatever. So it really depends on like the company's goals. Like For Alibaba's case, they don't care if one model is hosted in their API. It doesn't touch the Alibaba's top line revenue whatever one makes. So it's like for them it's like it's a no brainer to release it and get attention and maybe get some leads to their Alibaba cloud offerings. But in general for Black Forest Labs or companies like that, I think it's like a smart move to release like a distilled version as fully open source and less distilled or actual model as non commercial and then partner with inference companies and stuff like that.
A
What's the distribution of Usage? So is 80% of your revenue 5 models or are people really using the long tail of all these open models outside of the initial launch?
B
I think there is some power law, but not as much as you would think. And it keeps changing. That's the other part. It's not only a single model that's being used a lot month to month, it changes a lot. This summer has been crazy. They've been just countless amount of new video models, new image editing models like the leader kept changing week over week even. But if you take a step back and look which models are being used, people want to use either the best, most expensive video model or they want to use a cost efficient, good but cheap enough video model. So those two models are usually used a lot and whatever those models are, it changes week over week.
D
And yeah, the one good example is like FoxContext was released on late May and Coin Imager that was released like two weeks ago and now it's like topping out Context Dev. It's insane how quickly this stuff transition just because there's a better quality model and that's the value prop. Like you don't have to set up the infrastructure to manage Flux Context Dev. As soon as Koen Image Edit is available you can just switch to that with fault.
A
I mean it seems to me that if some models are open and some models you have to pay a revenue share, you ideally want to move the people off the revenue share models into the open models, right? What's that dynamic?
D
It's all priced in.
B
I'm also thinking like okay, we'll do whatever our customers are going to be successful with. We are still like early enough that these small calculations. I don't think it matters. I rather people actually go to production and build products with it and be successful rather than like okay, 20% here, 10% there.
A
I mean you're doing 100 million of revenue. Cool.
C
I'll just ask a few more questions we had around just like the how people really use this. Okay, I'll ask this super obvious question. How much is not safe for work?
D
Almost nothing.
B
Negligible.
C
Yeah, you don't moderate everything. Moderation is optional, right?
D
Moderation is optional to a level where illegal content is moderated and we also track the non illegal content NSFW moderation and we haven't seen more than 1%.
B
The models themselves are actually not generating that type of content.
D
Some model providers, especially if we look at Black Force Labs models, the models are not for is incapable of generating because or it's annealed in a way that is prevented and majority of our customer base. If you look revenue wise, it's like enterprises or more on the higher level of stuff where some of them might be like user space mobile applications. But for the last six months, nine months, we've been transitioning more and more to enterprise where it's less of a need for them.
C
So what are those enterprises doing apart from building a general purpose chatbot that can generate images? Maybe canva would be a good use case. But my imagination is a bit limited.
B
Beyond that, advertising seems to be absolutely growing and if you think about it, it fits very well. And let's talk about video advertising. So I keep repeating this, but some companies talk about oh, we are going to change Hollywood filmmaking is going to be revolutionized. I don't think it's that interesting. How many movies do you watch a year? Maybe 20, 25 movies. How many movies in the theater you watch? Three, four at most. So if there are like thousands of movies that a year like people won't be able to watch all of these movies. Like there's just.
C
It's a max quality.
B
Exactly, exactly. And with advertising it's the exact opposite. The more content there is, the different like you know, ways you can create ads. There is always economical value attached to it. So you can create unlimited number of ads, unlimited different versions of it and more personalized it is more economical value there is behind it. So with ads it fits really well to this type of technology because there is no limit what you can create.
C
I'll tell you a side comment about a Silicon Valley trend I'm seeing which I cannot explain, which is that all these YC startups and all these, they're spending between 10 to $70,000 per launch video.
D
Yeah.
C
In the age of generative video, like they're hiring, you know, actual creative directors, hiring a studio, hiring actors. I was in one of them. And like, do you need all that when you have generative video?
B
Like, I think clearly Roy started talking about generative video.
D
I don't know if you guys know PJ Ace. I think he's like the absolute killer for this stuff.
C
He launched like a. Is it a Super bowl ad or something?
B
He did like a basketball player. Yeah, yeah.
C
NBA Finals, right?
B
Yeah.
D
He also did our, like, series B announcement video. Like, we're like pretty close with him and like, that. It's insane that, like, what he's able to come up with and how viral it goes or like, you know, these videos where you spend like hundreds of thousands of dollars. Right. You know, like, it's just like, you just need to create viral content and these media models are the best way to do it. And we are still at the infancy of this. Right? Like, obviously it might not be professional quality. I still think, like, you know, human in the loop, like mixed, like, you know, content is the way for today, but in six months, who would know? Like 12 months, I think like 80% of it's going to be generated. Like, we were like, we were watching super bowl and we were like saying, oh, how much of this video is like, AI generated? It looks like AI generated. It could be, right? You can't tell. So I think at some point we're gonna have like 80, 90% AI generated all that.
C
It reminds me of, I think. Who's the guy? Faufer from Replicate, obviously is the best inspiration for all these workflows. He overlaid some kind of NBA realistic sort of Laura on top of game footage. So you could play like NBA 2K, but it looks like a real video. Yeah. I was like, what the hell? It's pretty cool. So maybe. And that's the other part of my question. And I wanted to get into ComfyUI, which is how much Lora serving is going on.
B
Right.
C
How much custom A lot. Okay. Is it like majority?
D
That's one of the reasons Majority. But if it's like 30%, is it like the majority and everyone trains their.
C
Own loras or you pick it off of like a Lora marketplace.
D
There's like.
B
That's why open source works very well with image and video models because you tap into this big Lora ecosystem, Everyone like it. Only like, I've never seen a closed source model that can create a good Lora ecosystem. It just basically doesn't exist. Like, maybe there is midjourney srefs, but I don't know if you can consider them loras srefs are Just seeds.
D
Right. Conditioning. Let's call it like another condition, like a prompt.
C
Yeah.
B
And then only the open source models have these rich Lora ecosystems and it's extremely, extremely popular.
D
But even for the oldest models, it brings a new life. When you see these cool loras. We have still a lot of people using SDXL with their own loras because they're happy with the quality. It's fast enough, it's cheap enough, it's amazing. These models are not single shuttable like the language models, even the editing models. GPT Image 1 or like Flux context, whatever. Quant image. If you put your face or if you put multiple people, whatever, you can't get the quality. It's going to be 90% there. But if you train for 1,000 steps with 6 to 20 images, you're getting 99% accuracy with. We worked a lot on fine tuning the right hyperparameters, writing distributed trainers, distributed optimizer stuff. And with those, people can train their Loras under 30 seconds now on the platform, run an inference with them in the same job and get like 99% accuracy for the same face character consistency, which is one of the biggest challenges that maybe like more on the enterprise side they're facing and less on the consumer side. If you're creating AI slope, you don't really care who it looks like. But if you're actually doing a product ad, you want it to exactly look like the product. Every single pixel on the product's banner, whatever you want it to look like that. So you train like croi Laura with 20 images and then after that you have almost a perfect pixel, perfect model.
A
All right, we had to train a Laura for every guest. Then we can make thumbnails with the guest doing.
C
I actually think that's a very good application because it's a nice way to inject brand, but not in a strict style.
B
And we are just entering post training on video models. And what's that going to mean? Because we didn't have a good base video model that it made sense. But now we have companies really investing into post training on 1.2.2 or Hunian and creating lip sync models on top of it, creating different video effects, camera angles. Seems like there's a lot of possibilities with creative data sets that people can do. I think in the next six months to a year we are going to have a lot of companies that are just built on post training of open source video models. Wow.
A
Let's talk about pipelines. So we are comfy Anonymous on the podcast Comfyui is kind of like this community that if you're into it, you love it. If you don't know about it, you kind of underestimate in a way. But people create all kind of crazy workflows. One have you thought about doing pipelines? Obviously you host all the models. There's kind of this.
D
We do have a pipeline product called file workflows and you can chain models together, but it's obviously less flexible than comfy where you can only chain different models outputs and not the intermediary stuff. In ComfyUI you can access the latents from one model and then pass it, plug a latent object or whatever. In our case it's more limited, but we have a workforce product and we have a serverless comfy product where people can bring their own comfy UI workflow and run it as an API with just like posting the workflow and inputs.
A
And have the models be served by you.
D
Yes.
C
So is that a bullish thing? Is that going to be commoditized by big models?
D
The thing that we saw is as the models get better, ComfyUI was a much bigger thing or relatively much bigger thing two years ago. Like a year ago when the models were like, like one of the biggest comfy AI use cases was you were like generating an ST 1.5 or SDXL image and you were fixing six finger situation, you were fixing the resolution, you were upscaling. Now that the models are actually so good, the comfyUI workflows are actually getting simpler. For image side, for video it's still very crazy. If you look at some video workflows, it's like there's 50 nodes, whatever that you're processing. So I think this is still a matter of how good the models are and how much extra stuff you need to do around it. For majority of use cases, for artistic use cases you still are doing a lot of stuff of. And that's like something we want to support. But we don't see that happening at super scale. Like super scale. There's not companies that are spending like $10 million plus on running this as an API. So that doesn't seem to be happening yet just because it's a bit inefficient and the more existing, it's more reliable to use an existing model than patching together 50 different things because you don't know when it's going to not work.
A
Yeah, but it feels like for things like ads, you want to do one step which is like maybe generate the backdrop. One step is like adding copy Are you saying the models are so good?
B
Training of models happening for sure. But I think confui did very well is you can also play with the.
A
Pieces of the model design, that's all.
B
Yeah. So like chaining of the models, like that's what the file workflow product does. It basically calls many different APIs back to back or in parallel and then creates a result at the end, I think.
D
And it's very popular. We have like enterprise adoption from it, like from very big names.
C
Yeah. Amazing. I was just going to go into sort of broader topics. The first thing that comes to mind was requests for startups if you're not working on fail. But you see a lot of things in the ecosystem.
D
Right.
C
What's the most obvious thing that people should be working on?
D
More model companies. Go raise more money. Train models.
C
That's obviously good for PAL and host.
D
Them on PAL if you're not interested in training models. But like if other people are trained, that's amazing. Go raise more money. There's so much money.
B
Or like scale AI for image and video models, like data. Data collection. Like more. More prepared data sets for video models. Like effects, different camera angles. Yeah. Everyone seems to be reinventing the wheel when it comes to collecting that data.
C
Yeah.
B
I think it's a great opportunity for someone to come in and do this at scale.
C
So it's really interesting because I think this is what Together AI did with Red Pajama is they actually built a data set for language models to help people create more open language models that they can serve. So at some point actually it might make sense for you guys to do that.
D
The image data is a bit more finicky situation in terms of copyright stuff, whatever. But it's an interesting area.
C
Do it in Japan.
B
I think it requires focus. It requires like this needs to be.
D
One connected thing to what Gorkham said is image video rl. That's an unknown. Unknown for us.
C
Say more.
D
I can't. It's like what does it look like? Can you RL a video model to be a world model? You can, right? Like if you consider it like it's essentially vert models, RL video models where you condition it for, like, you know, moving around. So what are the use cases for RL'ing image and video models? I don't know. But that's like if I wasn't working at. Well, that would be something that might be fun to explore.
B
Yeah.
C
And this is specifically for editing because the RL is for the reward is.
D
The edit or that's the thing that's what you should look for, right? The reward function. What is the interesting reward functions that you can apply on top of these base models. I see.
C
Interesting.
D
Okay, got it.
C
Actually I was really asking about if I want to build a foul wrapper.
D
Startup on top of.
C
Because you guys are very low level, which is fantastic. But I also want to give our listeners some ideas. If they're not going to work at that level.
B
I think I'm going to say it again. Advertising. There's so much opportunity there and everyone's still trying to create these. These horizontal applications where any creative can come in and do something. But a lot more targeted to specific industries. A lot more targeted to different kinds of ad networks. There's a lot of potential there.
C
And then requests for models. Obviously you want more open models, that's good for you. But any specialization in the models. I think image editing was a huge unlock which I didn't foresee until this year where we're like obviously we're going to even OpenAI.
D
Didn't guess it was going to go this big. It's insane how popular it became. And then everyone started catching up.
C
I was part of a group at Neurips that we meet at Neurips every year and they were talking about this at the last Neurips and so it's in the air but you have to.
B
Be at the researcher level and everyone moved to video, left image behind a little bit. So there was a little vacuum of research on image but luckily people saw it that it works very well and then they went back to it.
D
It's so much cheaper to train image models right now. If you want to train a SALTA image model, I don't think it's going to cost more than a million dollars. It's extremely cheap. It's like a matter of data engineering effort cleaning. I think it's a function of data set. Image models are really, really, really affected by the data set that you use.
B
I think one obvious thing that there's a gap in the market is like VO3 is very expensive and the reason why people like it is conversation. Right? If you can create maybe a smaller, cheaper video model that is less capable but can do conversation and sound very well. I think there is definitely one open.
D
Source one that we saw was multitalk. It was a post trained version of one and it's like really, really good for conversations but it lost the ability to generalize. It's only like talking faces at some point versus VO3. It could generalize and it could do Scenes and whatever. So I think that needs to be some middle ground between these two, between talking faces and extremely generalized video models where it's much cheaper to run. But at the same time you get this conversation because it's very memetic. There's infinite amount of memes that you can post with this, infinite amount of ads that you can do with this.
A
But you don't see a world in which you have a video model and then you have a separate, maybe audio only model that can generate the audio for that.
C
This is the word for this question, right? Do you stitch together a whole bunch of things or do you do better?
B
Less people did that before VO3, but. But what VO3 gets very well is like the timing. Almost like you ask for a joke and you know, the delivery and the timing and the laugh and like, you know, waiting right before like the joke drops. Like all of that is so perfectly timed. I don't think when you do it separately, you get it.
D
It also matches this human accent sound to the face that is talking, right? It's like, it's an unknown challenge for like other text to speech models. Like you like, it feels very natural.
B
Is VO3 the best text to speech model?
D
It is also one of the best. It is so good. Like, I don't think any model can do what, what is the commercial, but.
A
I would say the counter argument is that we dub movies. So there's already, you know, obviously you can.
D
It is also the best lip sync model. Like VO3 has the best, most accurate lip sync because it's generating very natively. There's really good lip sync models. I think they're like 95% there, but VO3 is like 100% there, like 99%.
C
To me, this is the single most bearish thing about workflows, right? And comfy UI and all this stuff because just wait for a bigger model. It's just pure bitter lesson.
D
Yeah, we love comfy AI.
C
But obviously when the technology doesn't exist yet, you have to stitch together things. A request for engineers.
A
Yeah, I'm sure you're hiring, right?
D
You just raised 125 million hiring. Like we just recently crossed 4, 40 people. But like for like, wow, for like three months ago, we were at like 20. So like for the last three months we have been actually accelerating, you know, best kernel engineers, best infrastructure engineers, best product engineers, best ML engineers. If you're the best at what you do, just come join us. I think it doesn't really matter what you do. Just like, we're just Hiring the best sellers.
B
Right now, even on the go to market side we are hiring account executives, customer success managers because we work with very large enterprises. We got a grow that side of the company as well.
A
Yeah. On the engineering side specifically, how do you think about how many people you need? There's this whole question of like lean AI. It's like you know, coding engine performance.
D
Team is like seven people. I think seven people like focusing on performance always like some of there's some overlap with our applied ML team which is taking these models, productionizing them, exposing new capabilities, building fine tuners and then helping customers adapt these models. So that team I think we can scale to like double, triple the amount because there's infinite amount of models and like you know, it's better because we're going to have more customers with more proprietary models. So just helping them optimize it is just like a really good function that we have.
B
That team scales very well because there's always independent work that can be done. Oh okay. So these three people are working on this new model trying to optimize that and it's completely independent from trying to optimize this other model. So we've been hiring a lot for that AppliedML team.
D
Our infrared team. We're probably going to keep it lean in contrast to the applied ML team and the product team. Maybe we want to build more higher level components where people can directly integrate to their applications because that's even like now SDK or SDKs. But think about with component. Imagine you're an E commerce website designer and you're not really the best component designer. So here's a virtual try on component that you can put to your app. Stuff like that, more higher level components. And this is also coming from the fact that wipe coding has been very, very insane. We see significant revenue wise, very small but we see a significant amount of user adoption just coming from people who are just from looking at our support tickets. Maybe they need more support. But there's a lot of people who are coding these applications without that much expertise in the product building. So we want to provide them more guardrail experiences where they can integrate much easier without messing with all the other lower level components.
C
That's really nice care of developer experience.
A
So crack low level engineers and cracked high level. Well yeah, high level engineers in general crack. Go to market people crack, whatever. Just join pal.
C
You know I'm always trying to refine the definition of crack. You know like both of you, like you leave the technical side of fail. What's a really hard Technical problem that if someone has the solution, they should talk to you immediately. Maybe that's the way to frame it.
D
Write a sparse attention kernel with FB8 on Blackwell and tell, you know, like, if you can do that, come join us. We already have like a good base.
B
Hired on the spot.
D
Hired on the spot. You know, like, stuff like that. I really like picking all these. Some of these applied ML people, like, we just picked them from discords who are working on these sort of generative media who are like already interested. We really have a high culture bar too, where everyone in the team loves generative media. Like, they're obsessed with it. They would have done this if this wasn't their job. We have this great composition. Obviously it's not like a prerequisite, but it's just naturally happened where we hired these people from discords, Twitters, hugging face. Like, one of our applied ML engineers had the number one top hugging face space with like, you know, creative workflows, whatever. So we hired a person who was training Loras on fall, like, just because they were training and posting cool Lauras, you know, like, just do cool stuff and we'll find you. Or like, you can reach out to us.
C
That's the master builder is what I've been calling this person.
A
Why not make it more explicit? So if I go on your careers website, right? It's like Applied ML engineer. It just kind of looks like any job description. I feel like there's like this question.
C
Of, like, that's why we have to do a podcast.
A
But I think it's not just about foul. I think in general, this is more.
D
Like if you know, you know, which I know is not the best way. You know, people know about fall already. So it's like, we haven't really cared that much. But you're absolutely right. Like, we should make it more explicit.
A
If I look at like George Hots, like on Tiny Grad, you have this balance. If like, hey, if you can solve this, you should probably work here.
B
Like, do you see I'm adding the bounty, right?
A
This seems like, hey, look, if you can write this kernel, it's like, yeah, you'll just get hired.
D
It is also. But like, one thing that we saw even with like, like, there's a lot of people who are just like, wipe coding stuff and reviewing those. Like, there's a limited amount of people who can review those, right? Like, so, like, like, how can you tell it's like not a shitty kernel.
A
Versus like a good well, but then you're spending the time interviewing too right?
D
Yeah so like we, we have like first line of defense with our recruiters whatever so you only get like like so there's like trade offs But I, I, I absolutely agree like maybe we should have like a kernel bench version that you can upload your kernel automatically evaluate things stability, performance, whatever and then if you do you get our email unlocked. Whatever a special email for you. But yeah, great ideas. Come join us. Build this.
A
Awesome guys. Anything else? Parting thoughts?
C
Yeah, I love your rant.
B
This was great.
D
Yeah, I'm happy to rant but one.
B
Is the podcast style. No.
C
Congrats on all your successes. I should also say it's fun to do karaoke with you guys.
B
Yes, let's do it again.
C
Both extremely technical but also like a fun crew that like I think it's pretty hard to and rare to to see so thank you. Nice to see the good guys win.
A
Awesome guys. Awesome.
Date: September 5, 2025
Host: Latent Space ([Alessio], [Spiks], [Co-host])
Guests: Gürkan (Founder of FAL), Batu Han Fau (Head of Engineering, FAL)
This episode delivers a comprehensive technical and business history of generative media, focusing on the evolution, implementation, and scaling of image, audio, and video foundation models. The conversation features the journey of FAL—a leading generative media platform that optimizes inference for developers—and dives into the rapid advances and operational challenges of deploying such models at scale. The guests share stories of technical pivots, GPU infra, kernel engineering, industry trends, and business insights, revealing how foundation models are revolutionizing creative work, advertising, and technical stacks.
"The final biggest thing was VO3 where it actually created this usable text to video component. Where before, text to video was a very boring, soundless video... now it's such a great experience."
— Batu Han Fau (06:28)
"Our main objective is: for whatever GPU type you're using... we're going to extract the best performance."
— Batu Han Fau (14:19)
“When the user asks for an image and iterating on it, if it's slower to create, they're less engaged, they create fewer images...”
— Gürkan (16:20)
Model Architecture Trends:
Rise of Video and Editing Models:
"It's insane how quickly this stuff transitions just because there's a better quality model..."
— Batu Han Fau (39:28)
Open Source Ecosystem:
Composability & Pipelines:
Advertising is Key:
Enterprise Revenue Shift:
On Industry Pivot:
“We chose to be a leader in this fast growing niche market rather than trying to go against Google or OpenAI or Anthropic... So far it's been growing fast enough that we were able to build a whole company around it.”
— Gürkan (08:14)
On Model Release Excitement:
“The best part of the job is the day of a model release, the adrenaline rush that comes with it. The whole team trying to scramble something together and release it.”
— Gürkan (04:14)
On Latency & Product Impact:
“When the user asks for an image and iterating on it, if it's slower to create, they're less engaged, they create fewer number of images...”
— Gürkan (16:20)
On AI in Ads:
“With advertising it's the exact opposite. The more content there is... the more personalized it is, the more economic value there is behind it.”
— Gürkan (42:08)
On Open Source Pace:
“Whenever we see someone actually pushing the frontier, it's a reason for excitement because now that's possible. Other people are just going to do it within a couple months, so we don't panic anymore.”
— Gürkan (29:38)
On Fine-tuning & Loras:
“Only open source models have these rich Lora ecosystems and it's extremely, extremely popular... People can train their Loras under 30 seconds now on the platform and get like 99% accuracy...”
— Batu Han Fau (44:42–46:30)
On Technical Recruitment:
"If you can write a sparse attention kernel with FB8 on Blackwell, you’re hired on the spot.”
— Batu Han Fau (58:16)
This episode paints a vivid picture of the dynamic generative media space, showing how focused technical leadership, community energy, and practical market focus (ads, user UIs, creative tools) drive both technical innovation and business success. From performance kernel wizardry to enterprise partnerships, FAL’s journey summarizes the new playbook for AI-native infrastructure—and reveals the opportunities still to be built for the Software 3.0 generation.