Latent Space: The AI Engineer Podcast
Episode: A Technical History of Generative Media
Date: September 5, 2025
Host: Latent Space ([Alessio], [Spiks], [Co-host])
Guests: Gürkan (Founder of FAL), Batu Han Fau (Head of Engineering, FAL)
Episode Overview
This episode delivers a comprehensive technical and business history of generative media, focusing on the evolution, implementation, and scaling of image, audio, and video foundation models. The conversation features the journey of FAL—a leading generative media platform that optimizes inference for developers—and dives into the rapid advances and operational challenges of deploying such models at scale. The guests share stories of technical pivots, GPU infra, kernel engineering, industry trends, and business insights, revealing how foundation models are revolutionizing creative work, advertising, and technical stacks.
Key Discussion Points & Insights
1. FAL’s Company Evolution and Scale
- Origins:
- Started as a "future store," pivoted to building a cloud Python runtime, then an inference system, and ultimately landed on generative media (image, video, audio) (00:33).
- Growth:
- 2 million developers, 350+ models hosted, 100 million+ revenue, Series C round ($125M) (02:11, 02:50).
- Model Selection:
- Prioritize unique models addressing real gaps in the stack, using in-house and community signal evals (03:08).
- "We don't add a model that's significantly worse in any aspect compared to other models that we have..." – Batu Han Fau (03:08).
2. A Timeline of Generative Media Model Breakthroughs
- Stable Diffusion 1.5:
- First hit and key pivot point, widely adopted, drove need for optimized cloud inference (04:41).
- Stable Diffusion XL (SDXL):
- Led to first million in revenue; ecosystem exploded due to face/object fine-tuning (05:05).
- Flux Models:
- Marked transition to commercially/enterprise-grade; revenue jump from $2M-10M in a month (05:35).
- Era of Video Models:
- Partnerships with Luma Labs, Chinese labs (Clink, Minimax).
- "VO3" with Google DeepMind unlocked usable text-to-video, meme/ads, huge usage spike (06:28).
"The final biggest thing was VO3 where it actually created this usable text to video component. Where before, text to video was a very boring, soundless video... now it's such a great experience."
— Batu Han Fau (06:28)
3. Technical Decisions: The Value of Focusing on Generative Media
- Strategic Pivot:
- Deliberately chose not to compete in large language models (LLMs), focusing instead on net-new, under-served market of generative media (07:16–09:37).
- Avoided LLM commoditization and competition with tech giants; emphasis on market leadership and defining new use-cases.
- Kernel Engineering and Performance:
- Early lack of kernel specialists meant “so many low hanging fruits that we started to pick up and start optimizing” (10:05).
- Custom kernels, inference engine development for various architectures, always chasing new model, architecture, or chip releases to maintain performance leadership (11:59–14:57).
- Performance gains can be dramatic—anywhere from 1.5x to 10x—depending on hardware and model (14:13–14:57).
"Our main objective is: for whatever GPU type you're using... we're going to extract the best performance."
— Batu Han Fau (14:19)
4. Latency, Scale, and Infra
- Latency Matters:
- Direct impact on user engagement and business metrics—“it’s almost like page load time” (16:20).
- Large AB test: slowing image generation causes user drops (16:20–17:01).
“When the user asks for an image and iterating on it, if it's slower to create, they're less engaged, they create fewer images...”
— Gürkan (16:20)
- Managing Serverless GPUs:
- Built multi-cloud orchestration, distributed file system, container runtimes for ultra-fast scaling and cold start (21:13).
- Over 10,000 H100 GPUs running worldwide (22:14).
- Custom moderation and enterprise content filters offered (22:23).
5. Business Model and Revenue Structure
- Collaboration with Labs:
- Host both open and closed source models; sometimes get model weights for deep optimization, acting as cloud infra partners for research labs (18:01–18:56).
- Model Hosting/Economics:
- Labs may release “distilled” open models, commercial/pro versions with revenue share, driving innovative monetization for model developers (37:17).
6. Evolving Model Architectures and Community Impact
-
Model Architecture Trends:
- Constant innovation—researchers “change things for the sake of changing things" (25:47).
- Distillation, controlnets, and “Lora” fine-tunes all move in and out of favor depending on current research breakthroughs and market needs (26:03–27:29).
-
Rise of Video and Editing Models:
- Video models now account for more than 50% of FAL revenue (33:28).
- Chinese labs (Alibaba, ByteDance, Stepfun) are releasing high-quality models rapidly, often cost-effectively (34:24, 36:08).
"It's insane how quickly this stuff transitions just because there's a better quality model..."
— Batu Han Fau (39:28)
-
Open Source Ecosystem:
- Long-tail of models sees real use, not just a power law—market leader tag means FAL gets day-0 launches and attention (39:28–39:51).
- Lora customization ecosystem keeps older models relevant; open models foster a thriving fine-tune/modification culture (44:28–46:30).
-
Composability & Pipelines:
- “File workflows” product for chaining models; support for serverless ComfyUI (user workflows for image/video generation) (47:35).
7. Use Cases and Enterprise Adoption
-
Advertising is Key:
- Dynamic, personalized video and image ad generation is where generative media has the most business value (41:31–42:35).
- Hollywood/film disruption is “overhyped”—the ad sector “fits really well... there is always economical value attached” (42:08).
-
Enterprise Revenue Shift:
- Customer base increasingly enterprises; minimal NSFW workload, heavy moderation, focus on product quality and compliance (40:37–41:21).
Notable Quotes & Memorable Moments
-
On Industry Pivot:
“We chose to be a leader in this fast growing niche market rather than trying to go against Google or OpenAI or Anthropic... So far it's been growing fast enough that we were able to build a whole company around it.”
— Gürkan (08:14) -
On Model Release Excitement:
“The best part of the job is the day of a model release, the adrenaline rush that comes with it. The whole team trying to scramble something together and release it.”
— Gürkan (04:14) -
On Latency & Product Impact:
“When the user asks for an image and iterating on it, if it's slower to create, they're less engaged, they create fewer number of images...”
— Gürkan (16:20) -
On AI in Ads:
“With advertising it's the exact opposite. The more content there is... the more personalized it is, the more economic value there is behind it.”
— Gürkan (42:08) -
On Open Source Pace:
“Whenever we see someone actually pushing the frontier, it's a reason for excitement because now that's possible. Other people are just going to do it within a couple months, so we don't panic anymore.”
— Gürkan (29:38) -
On Fine-tuning & Loras:
“Only open source models have these rich Lora ecosystems and it's extremely, extremely popular... People can train their Loras under 30 seconds now on the platform and get like 99% accuracy...”
— Batu Han Fau (44:42–46:30) -
On Technical Recruitment:
"If you can write a sparse attention kernel with FB8 on Blackwell, you’re hired on the spot.”
— Batu Han Fau (58:16)
Timestamps for Important Segments
- Introduction & Company Origins: 00:03 – 02:11
- Platform Scale, Revenue & Model Portfolio: 02:11 – 03:41
- Model Evaluation & Community Trends: 03:41 – 04:28
- Technical History: Stable Diffusion to Flux/VO3: 04:28 – 06:41
- Strategic Pivot Discussion: 06:41 – 09:37
- Kernel Engineering & Optimization: 09:37 – 14:57
- UX/UI & Latency Impact: 15:43 – 17:01
- Open/Closed Model Partnerships: 17:45 – 18:56
- Infra, Serverless GPUs, Scaling: 21:13 – 22:15
- Model Architecture Evolution & Community: 24:48 – 27:29
- Video Models & Market Explosion: 33:19 – 36:46
- China’s Rise, Model Distribution: 36:08 – 39:51
- Enterprise Adoption, Use Cases, Ad Market: 41:31 – 43:59
- Customization, Loras, User Workflows: 44:28 – 46:30
- ComfyUI, Workflows, and Pipelines: 47:18 – 49:41
- Requests for Startups/Models: 49:50 – 53:32
- Recruiting & Talent Philosophy: 55:33 – 59:10
Requests, Reflections & Predictions
- Biggest Technical Gaps:
- Scaling data for model training, better RL/reward functions for editing world/video models, and maintaining infra performance edge with each new generation of GPUs/models (50:05, 51:24, 58:16).
- Startups Should:
- Build new models, develop ad-specific creative tools, or provide higher-level components (like virtual try-on for e-commerce) (51:51).
- On Model Trends:
- Cheaper, smaller, more conversational video models are the next big gap/opportunity (53:13).
- On Team Culture:
- FAL seeks master builders—engineers obsessed with generative media, discovered via their actual creative output (58:24).
Concluding Thoughts
This episode paints a vivid picture of the dynamic generative media space, showing how focused technical leadership, community energy, and practical market focus (ads, user UIs, creative tools) drive both technical innovation and business success. From performance kernel wizardry to enterprise partnerships, FAL’s journey summarizes the new playbook for AI-native infrastructure—and reveals the opportunities still to be built for the Software 3.0 generation.
