Summary8 min read

Podcast Summary: The a16z Show – AI, Design, and the Power of Open Models

Date: June 15, 2026
Host: Andreessen Horowitz (A16Z)
Guests: Yucca Lee, Justine Moore (A16Z), Mohamed Noorouzi (Founder & CEO, Ideogram)

Episode Overview

This episode of The a16z Show dives into the intersection of artificial intelligence, design, and open-source models in creative technology. Yucca Lee and Justine Moore from a16z interview Mohamed Noorouzi, founder and CEO of Ideogram, about the recent release of Ideogram’s open-weights image generation model. They explore the motivation behind releasing open-source models, innovation in controllable image generation, the technical advances enabling high-quality outputs with small models, and the future of customizable creative AI for both professionals and enterprises.

Key Discussion Points & Insights

1. Why Release an Open-Source Model?

Motivation for Open Weights: Ideogram had previously released only closed models. This time, they opened the model weights to expand collaboration with app developers, inference providers, enterprises, and chip makers.
- Quote: “By releasing the weights we are actually extending ourselves...working with inference providers, large enterprise...It's basically us saying, hey, we are very serious about building the foundation model and we would like to work with you wherever you are.” — Mohamed Noorouzi [02:05]

2. Small but Powerful: 9.3 Billion Parameters

Technical Achievement: Ideogram’s model is significantly smaller than previous state-of-the-art models (9.3B vs. 80B+ parameters), leading to efficiency and wider accessibility.
Significance: The model can run on consumer GPUs, increasing democratization and privacy for creative workflows.
- Quote: “It's 9.3 billion parameters. Previously the SOTA is probably like 80 billion...you can run it on a single GPU...opened up opportunity for people to use it.” — Justine Moore [17:57]
- Quote: “We focused on innovation...We think now is actually a good time for us to scale...given the quality of the model at 9.3 billion parameters. You should imagine what if this model is 100x bigger…” — Mohamed Noorouzi [18:25]

3. Innovations in Image Generation & Editing

Prompting & Layout Control: A key innovation is detailed prompting using structured JSON to give users granular control over elements like layout, font, and text within generated images.
- Quote: “We went really detailed on the prompting...layout control, bounding box and a number of elements. This unlocks a lot of design use cases...this model is very versatile.” — Mohamed Noorouzi [03:34]
Editable Text and Layout: The upcoming feature Noorouzi is most excited about is editable text and layout—moving beyond generating flat images to producing adaptable, design-ready assets.
- Quote: “I’m personally most excited about…editable text and layout control…for a lot of design and marketing use cases, we need editable design, not a single flat image.” — Mohamed Noorouzi [03:34]
Accurate Text Rendering: Ideogram’s model excels at rendering long, accurate textual content in images—an ongoing pain point in generative image AI.
- Quote: “…people talk about the quality of typography, the quality of text. We are known for really stylized typography, for logo, T shirt design, graphic design…” — Mohamed Noorouzi [05:24]

4. Training Techniques & Benchmarks

Innovative Data Processing: Instead of relying on short alt text from the internet, Ideogram uses models to generate detailed textual annotations and bounding box info from images, then trains the image models with this data.
- Quote: “We train models to go from image to text, and in this case image to text with detailed bounding box information...and then we go from text to image backwards.” — Mohamed Noorouzi [07:11]
Tuning for Taste and Realism: Manual evaluation with designers was done to push 'taste'—a hard-to-benchmark, yet crucial aspect of quality and differentiation.
- Quote: “We care about our own internal evaluation...we worked with designers...side by side comparisons between different versions of the model...to really push on the taste.” — Mohamed Noorouzi [17:01]

5. JSON Prompting and Professional Applications

Why JSON?: The model expects JSON-structured prompts rather than raw natural language for fine-tuned control. While the open-source community resists JSON, Noorouzi explains it’s essential for predictable, high-quality results.
- Quote: "This model is only trained with JSON prompting and you have to provide JSON with that particular structure for you to get good quality output." — Mohamed Noorouzi [09:59]
- Quote: “For professional use cases, you don't want to just roll the dice...We show you the actual input to the model as well in the JSON format, and we think that will foster more innovation and creativity.” — Mohamed Noorouzi [12:41]
Implication for Creative Iteration: Precise JSON prompts allow for regional edits and consistency, critical for branding and large-scale design workflows.
- Quote: “You can take it and change one element in the scene and that results in very, very consistent output...this also has a big implication on editing.” — Mohamed Noorouzi [14:04]

6. Customization: The Next Frontier

Customization for Artists & Enterprises: The open weights model lets individual creators and companies fine-tune the model for unique artistic styles or brand identities.
- Quote: “…every artist...can really customize this model to the nuances of their style, the texture of their canvas, and really get 2k output and hopefully make that part of their workflow.” — Mohamed Noorouzi [20:54]
Enterprise Demand: Enterprise customers desired model fine-tuning for brand consistency, driving the need for open-sourcing.
- Quote: “Companies come to us...‘these generic models...don’t follow our brand guideline.’ And once we train custom models for them, they're like, ‘wow, this understands my brand DNA.’” — Mohamed Noorouzi [22:58]

7. Editing vs. Fine Tuning

Complementary Roles: Editing is powerful for quick iteration, but deep customization still benefits from fine-tuning a model to a specific style or set of constraints.
- Quote: “Editing is very powerful...it's quick...But then customization gives you freedom to not prompt at all...I don't think they are mutually exclusive, but they're both very powerful.” — Mohamed Noorouzi [26:26]

8. Agentic, Iterative Design Workflows

API & Agents for Creative Work: The future of creative work involves agentic loops—where AI agents call APIs to generate, edit, and iterate on creative assets at scale, supervised by designers for final curation.
- Quote: “We are very, very excited about agenting workflows...you can go into your agent...connect to the API and generate a bunch of images...in a couple hours, you have your landing page up and running.” — Mohamed Noorouzi [29:24]
UI Still Matters: While agentic and API-driven flows can help scale exploration, human-guided UI and UX are vital for precise finalization and iterative edits.
- Quote: “Once you know what you want, you need a UI, you need a UX to go and edit...kudos to the best designers who understand how these models work and are trying to figure this part out.” — Mohamed Noorouzi [31:49]

9. Representation Layer: JSON, HTML, or Beyond?

The Path Forward: Ideally, intermediate representations should be accessible to both language models and humans. HTML may replace custom JSON for model instruction because LLMs are already trained on it.
- Quote: “Would you have to design your own version of HTML or would you align to HTML?...Seems like HTML makes more sense just because large language models have been trained on HTML.” — Mohamed Noorouzi [38:17]

Notable Quotes & Moments

On Model Taste:
“We really want our models to have taste...One element of taste is going outside the norm a little bit and not conforming to the average opinion.” — Mohamed Noorouzi [15:55]
On Customization Impact:
“For enterprise...we give a glimpse of customization...developers within enterprise, and scale this side of our business further.” — Mohamed Noorouzi [22:58]
On Democratizing Design:
“I saw a tweet before coming here that somebody said, I have no design training and I got this design in two minutes and it actually looked really, really nice.” — Mohamed Noorouzi [33:12]
On Artistic Diversity:
“We tried to be enabling very different styles, and that was one of our goals. We still want to produce tasteful output, but that doesn't mean we have to force a complex output to you. If you want a minimalist design, you should be able to get that...” — Mohamed Noorouzi [35:28]

Timestamps for Key Segments

[00:57] Episode setup and guest introductions
[02:05] Ideogram’s shift to open weights and focus
[03:34] Model innovations: control, editable text, layout
[05:24] The importance and challenge of accurate text in images
[07:11] Training methods and data innovations
[09:47] JSON prompting, community feedback, pros and cons
[11:31] JSON as intermediate representation—bridge between natural language and images
[12:41] Professional control and implications of JSON prompts
[17:01] Model evaluation, taste, and internal benchmarks
[18:25] Building a high-performing small model: why and how
[20:54] Customization, artistic, and enterprise use cases
[22:58] Enterprise needs and model fine-tuning
[24:26] Customization for artists and businesses, workflow details
[26:26] Editing, fine-tuning, and creative iteration
[28:42] API, agentic workflows, and the future of creative APIs
[31:49] Scaling creativity, the designer’s evolving role
[34:37] Artistic diversity vs. uniform outputs in frontier models
[38:17] Evolution of model representation: JSON vs. HTML

How to Get Involved / Next Steps

For Engineers & Collaborators:
Ideogram welcomes experienced engineers and collaborators interested in open-source and creative AI projects.
Contact: partnerships@ideogram or DM Mohamed Noorouzi on Twitter/LinkedIn. [39:15]
For Artists & Enterprises:
Fine-tune your model via Ideogram’s online platform (Model tab)—minimum 15 images, with tailored solutions for enterprise clients. [40:44]

Overall Tone & Takeaways

The discussion is optimistic, practical, and highly open to community involvement—reflecting the open-source ethos and the ambition to democratize high-quality creative tools. Noorouzi emphasizes both the technical breakthroughs (small, controllable models) and the philosophy behind building platforms that uplift both individual creators and businesses, reinforcing a vision where powerful, customizable AI becomes essential in future design and branding work.

Loading summary

Transcript86 lines

[00:00]
Yucca Lee
You.
[00:06]
Mohamed Noorouzi
It's not about how good a model is in the general sense, it's about how good is this model for my use case. For a lot of design and marketing use cases we need editable design, not a single flat image.
[00:20]
Yucca Lee
It's super impressive. Honestly reaching the level of things like nanobanana or GPT image with an open source model. Why did you think that was important?
[00:28]
Mohamed Noorouzi
We really want our models to have taste. Every artist, they can really customize this model to the nuances of their style, the texture of their canvas and really get 2k output and hopefully make that part of their workflow.
[00:43]
Justine Moore
One thing we were always wondering is that this release open source model is so small. It's 9.3 billion parameters. Like previously, the SOTA is probably like 80 billion parameters. It's like 9x of a difference. How did you do it?
[00:57]
A16Z Podcast Host
We focused on image generation has improved dramatically over the last few years. The next challenge is not simply creating images, but giving users more control over what gets created and how. That includes everything from typography and layouts to editing, customization and workflows that fit into professional creative processes. Yucca Lee and Justine Moore speak with Ideogram founder and CEO Mohamed Noorouzi about image generation, open weight models, design tools and the future of creative AI.
[01:34]
Yucca Lee
So today we're excited to have Mohammed, CEO and founder of Ideogram, a Toronto based generative AI company that just released their first open weights image model. Congrats on a huge release.
[01:44]
Justine Moore
Congratulations.
[01:46]
Mohamed Noorouzi
Thanks for having me.
[01:47]
Yucca Lee
We're really excited to talk through something that everyone has been buzzing about which is the fact that the model is open weights. The previous Itogram models have been closed source, so would love to hear how you made the call to make it open this time.
[02:00]
Mohamed Noorouzi
We what has happened is there has been a lot of progress in industry and we used to do everything. Basically we had our own first party app as well as our own first party API and model development itself is a lot of work and we decided to focus a little more on the model side. We think that's where a lot of potential exists. We still want to continue to own the interaction with the users. We think there's a lot of important feedback we can get from the users directly. But then we want to focus more on building the model and by releasing the weights we are actually extending ourselves and working with inference providers, working more directly with large enterprise. They have every ability to customize the models or host it on PREM or optimize it for device and we would love to work with the best chip makers to really optimize the model, the best inference providers. So this is basically us saying, hey, we are very serious about building the foundation model and we would like to work with you wherever you are, whether you're an app developer or a chip maker or an inference provider.
[03:15]
Justine Moore
I think you already kind of touched on this. The new open source model is very exciting in that it unlocked a lot of new use cases. It's very photorealistic, I think can generate up to 2K with a smaller model too. Obviously there's very precise layout control as well. Do you want to talk about some of.
[03:32]
Yucca Lee
The.
[03:32]
Justine Moore
Net new use cases that's unlocked by this model?
[03:35]
Mohamed Noorouzi
So this is actually a foundation based on which we're going to release some more exciting features next. This is just the first release, just testing the waters, figuring out how to work with hugging face and the open source community, comfyui, et cetera. What I'm personally most excited about is something we haven't released yet, which is editable text and layout control. And I really believe for a lot of design and marketing use cases we need editable design. Not a single flat image. And we haven't released that yet. We kind of show the teaser in our video, but I'm personally most excited about that. On the technical side, what we've done is we went really detailed on the prompting and if you look at our prompts, it's like thousands of words, each element in the image, where it is in the image. We have layout control, bounding box and a number of elements. And that's one of the key innovations here that unlocks a lot of again, design use cases because you clearly want font control, you want layout control and this model is very versatile, allows you to really fix certain elements, fix positioning and control the image generation in every detail possible.
[04:57]
Yucca Lee
Amazing. And one of the other things I immediately noticed about the model was how you can render super long texts, like paragraphs of text, completely accurately, which you either give the model in the prompt or you ask the model to kind of come up with something and it does it really well and it's super impressive. Honestly. Reaching the level of things like nanobanana or GPT image with an open source model, was that something you guys really focused on and sort of. Why did you think that was important?
[05:24]
Mohamed Noorouzi
I don't know if you remember, but the very first model we released three years ago and at the time image generation was synonymous with garbled text and there were memes about Dall E2 generating travel posters with incorrect City names, which is fun to look at. So I remember at the time there were just a few people building these models, and the question was, how can we differentiate what's unique about our model? And we said, okay, text generation, accurate text is something we have. And then we released it and we were really surprised. It's just so many people were so excited about text generation. And then we realized, oh, actually that's the whole graphic design and storytelling industry. Text is very important part of image generation. And that became a very important part of our brand. So if you search ideogram, people talk about the quality of typography, the quality of text. We are known for really stylized typography, for logo, T, shirt design, graphic design in general. And so we continue to push forward. I think our previous model wasn't really beating the state of the art in text generation, but we continue to focus on that. And we had a bunch of research breakthroughs. And with this model, despite the fact that it's very tiny, the text generation is very, very accurate.
[06:46]
Justine Moore
One of the things that stood out to us, which is what the community has been chatting about, is how there's new ways of processing data. As you're training the model, which is like, you kind of let the model learn what is a bounding box and how to do the layering and color palettes. Do you want to talk more about some of the innovations you had during the training process on what made this model so good with these different shining features?
[07:11]
Mohamed Noorouzi
Yeah, it's kind of difficult to exactly describe what resulted in such an amazing model. I think a lot of it is focus and evaluation. Evaluating image models is actually a very difficult thing to do. There are lots of benchmarks out there, but people look at them and they're like, okay, this doesn't correlate with pixel fidelity that I care about or realism. You don't really want novice users to judge the quality of these models because they may be looking at small monitors that aren't really adjusted for color accuracy. And we always cared so much about quality, photorealism and again, text accuracy. So throughout training, we always measure text accuracy and we apply very detailed changes to the model and data and see how that results in performance. So I would say a lot of it is really listing all the possible changes and very carefully tuning each element of the model and see what happens. Obviously, we try to gather as much data as possible. One of the standard recipes in the industry is that we take images and we turn them to text using visual language models. The very first models we were Training three, four years ago would be based on the alt text that you can find on Internet. That is, each image on the Internet may have an alt text field associated with it which describes what's in the image. But the problem is the alt text is often very short or inaccurate. And what we do now is we train models to go from image to text, and in this case image to text with detailed bounding box information, detailed element information. If we care about text and we really want to make sure all the text in the image is correctly described and then we go from text to image backwards. So it's kind of interesting, we gather all the images from the Internet. Some of them may have alt text, some of them may not have alt text. And then we use AI to go from image to text and then we train another AI model to go from text to image. So that's one of the key recipes that results in very good models.
[09:28]
Justine Moore
I saw a lot of JSON prompting in your technical blog, which is very unique. And as I was trying out model, it seems like was translating the text, the prompt to a JSON representation with implicit structure. Do you think JSON is a representation for image models going forward or do you think there's another representation there?
[09:48]
Mohamed Noorouzi
It's a very good question. I don't know if you've seen the open source community is a little upset because of the safety image that shows up.
[09:57]
Yucca Lee
They always are.
[10:00]
Mohamed Noorouzi
Reddit was really lashing out as our engineers and one of our people said, oh, we might fix this. And they were like, we might. But the fact is the community needs to also read the documentation and bear with us. This model is only trained with JSON prompting and you have to provide JSON with that particular structure for you to get good quality output. So I don't know if this is a feature or bug. We did have some safety built into the model, but that is also detecting incorrect prompts. So if you just give it a one word prompt, then you get this image is blocked by safety image back. But that's because your prompt is not a Vella specified JSON. Now, we don't want people to write in JSON. We don't think that's a natural way of interacting with these models. But I do strongly believe that we need to use all the AI innovation to build the best image generation and editing models. And there had been a lot of progress in language models in the text space. So the question is, if you want to go from some vague idea to an image, what's the exact process, how much of the thinking happens in the language space and how much of the thinking happens in the actual kind of pixel generation space. I know you're an artist, so you should probably tell us
[11:32]
Justine Moore
then.
[11:32]
Mohamed Noorouzi
For example, we always had this prompt, Meaning of Life. We test our models based on this prompt. So if you have Meaning of Life as your prompt, then do you want an image and diffusion model to decide what the meaning of life is, or do you want a language model to think and go back and forth and come up with a description of a scene that's explaining the meaning of life? And that's kind of the context of JSON. Prompting is the intermediate representation that we think language models can describe images in that format, and then image generation can happen. In general, we see a lot of editing happening in the field, and that's the new frontier. So I don't think we should expect the interaction to be only through text or JSON, but it's a combination of JSON and image. If I were to make a guess.
[12:25]
Yucca Lee
Awesome. So it sounds like a lot of it is basically taking often a relatively simple prompt that someone puts into the model and then translating it on the back end with the magic prompt into JSON so that the model can make something probably more detailed and interesting than the person with the short prompt may have even imagined.
[12:42]
Mohamed Noorouzi
Exactly. And then I think everybody else does it too. OpenAI does it, Google does it. But then they don't give you the actual input to the model. But again, for professional use cases, you don't want to just roll the dice and then get some other completely different image interpretation of your prompt. We show you the actual input to the model as well in the JSON format, and we think that will foster more innovation and creativity.
[13:08]
Yucca Lee
And for people who want control or consistency, I think that'd be key.
[13:12]
Justine Moore
Yeah. And to Justine's point, exactly what is the implication for the professional use cases? Like what? Like was the JSON prompting? What can they do more easily now with this capability compared to before?
[13:27]
Mohamed Noorouzi
Maybe zooming out a bit. The world is changing, right? I'm sure your work has changed a lot. My work has changed a lot. I'm actually writing PRs now, which is very exciting.
[13:38]
Justine Moore
Amazing.
[13:40]
Mohamed Noorouzi
Or in collaboration with AI. AI does the writing. And then for creative professionals as well, the world is changing. They are very excited to. Okay. A large number of them are very excited. I think more and more of creatives are excited about adopting AI and they see the potential. We think ideation is still the most important part of the creative process, and humans are Very good at putting context into these models and their understanding of situation creativity will result, will result in the best ideas. I'm very excited about a future where every kid will have these models at their fingertips and then they can be much more creative and we're going to experience a much more beautiful world now as it comes to JSON prompting. Again, because the JSON prompt describes every detail in the scene, you can take it and change one element in the scene and that results in very, very consistent output. So you could be describing like a tiny detail in the corner of the image and then leave everything else the same. And we think this also has a big implication on editing. We haven't released our editing models yet, but they will also utilize the same JSON prompting approach and it's just more control. And with layout as well, you can imagine for every brand, you have brand guidelines in terms of, okay, the size of text, the font of text. And we think this kind of foundation allows us to really get into a lot of the enterprise use cases.
[15:23]
Yucca Lee
Amazing. We've been talking about some of the things you've focused on with this model. Obviously there's always trade offs in model training in terms of what you want the model to be really amazing at and what you focus on. Would love to hear what you focused on for this release. And also do you consider more capability? Like we want to be the top for this specific prompt adherence or something like that, or is it more thinking about the end user and holistically, where are the, what are the different vectors where they want the model to be performant?
[15:56]
Mohamed Noorouzi
Right. So we care about a couple things. One is graphic design in general. Again, text rendering is part of that. We think basic graphic design is everywhere. Like you go in a city, you open your eyes, you see billboards, you see storefronts, they all have text. And actually it's much more important than, I guess photography is part of graphic design. But graphic design is actually the frontier for a lot of business use cases for storytelling. So we definitely focused a lot on graphic design since the release of our first model, which was good at text rendering. And in addition, we think taste is extremely important. We really want our models to have taste and it's very hard to explain it what exactly taste is. One element of taste is kind of going outside of the norm a little bit and not conforming to the average opinion, which is a little against being on top of the leaderboard.
[16:58]
Yucca Lee
Yeah.
[16:59]
Mohamed Noorouzi
Which is kind of interesting.
[17:00]
Justine Moore
Your own leaderboards.
[17:01]
Mohamed Noorouzi
Yeah, we just came out. Ultimately, we worked with all the arenas we hope all of the arenas will improve in detecting the nuances of images and image quality. But we care about our own internal evaluation and unfortunately we see that AI is not very good at doing the actual taste evaluation yet. So we worked with designers and we have side by side comparisons between different versions of the model as well as other models to really push on the taste. So we really care about taste. I think there's still so much more to do. Obviously.
[17:40]
Yucca Lee
Yeah, I was going to ask, my follow up was going to be do you have like one vibe guy or vibe woman internally who's like the taste arbiter? Because it can be hard to measure taste, but it sounds like you have a group of designers, which is probably better.
[17:51]
Justine Moore
I think.
[17:52]
Mohamed Noorouzi
Yeah, we need to find that tastemaker.
[17:55]
Yucca Lee
Yeah.
[17:58]
Justine Moore
And then I guess one thing we were always wondering is that this release open source model is so small, it's 9.3 billion parameters. You know, like previously a SOTA is probably like 80 billion parameters. So it's like 9x of a difference and then you can run it on a single GPU instead of having a lot of compute footprint, which really opened up opportunity for people to use it. So the question is, how did you do it?
[18:26]
Mohamed Noorouzi
We focused on the details of the model and we know we can win on scaling. I don't think I used to work for Google. I don't think even if we raise 10x the amount we've raised so far, we can beat Google in terms of the number of chips that we can dedicate to each model training. So instead we focused on innovation. We think there's still so much more to do to innovate. We are also focusing on differentiation. I don't think a lot of labs are focusing on design. Graphic design in particular, editable text that I'm talking about. And then we also decided to go OpenVait to really partner with a lot of other platforms to be at least another option for people who care about design. And so yeah, so we focus on the small model primarily because we think there's still so much to do. We think now is actually a good time for us to scale given the quality of the model at 9.3 billion parameters. You should imagine what if this model is 100x bigger and there are a mixture of experts, architectures that don't make the model necessarily store, but they make the model a lot more powerful. So I think that's one new frontier for us to kind of scale this
[19:47]
Justine Moore
model 10x100x I imagine because it's a smaller Model, as you mentioned, it's harder to win on scaling and counting number of chips, but it is possible to win on a specific, specific domain or optimizing for a different thing in a different domain. So what was the trade off for the research team when training this model to decide what to focus on?
[20:11]
Mohamed Noorouzi
Right, so one thing that you kind of alluded to is the fact that this can run on consumer GPU now. And we think there is a new frontier that, you know, you do a lot of editing on your phone, a lot of image generation on your phone, and it's not only about pushing quality at, you know, 100 billion parameter, 1 trillion parameter range. We think it's really important to have small models that can run on device. Obviously, a lot of companies care about privacy and we are really excited to partner with the industry to push the kind of small model size quality further. Now, in terms of the research team, it's an interesting question whether you can focus on a very small, narrow field in image generation. I sort of believe that you need a general understanding of the world in order to even be good at logo generation or be good at illustration style. But once you have a general base and then you can customize the model for certain use cases and it can be the best at that particular use case. So we are really excited about customization. We think that's a new frontier. And again, that's an important reason for releasing the model with open weights. That is, every artist who has at least like 50 pieces of art or hopefully like a little more, they can really customize this model to the nuances of their style, the texture of their canvas, and really get 2k output and hopefully make that part of their workflow and be augmented with AI and be a lot more productive and creative. And we actually have worked with some artists in residence who said to us, okay, this at least made me 3x faster in making this comic book. So that's one frontier. And another frontier is enterprise. Again, because it's not about how good a model is in the general sense, it's about how good is this model for my use case. Right. At the end of the day, I may not care about many of the general purpose use cases, I may not care about character consistency, for example, as an enterprise. But even though this model is small, they think it can be the best model for particular use cases, whether that's a search and artist or whether that's an enterprise.
[22:46]
Yucca Lee
Were you getting a lot of demand from enterprises when it was a closed model that wanted to fine tune it on their own data. Were there use cases where you're like, we really need to open source it now because there's so many cool things that people want to do.
[22:58]
Mohamed Noorouzi
Yeah, yeah. So some of the enterprises we work with are very sensitive. They don't want to talk about using AI in their visual side of things. But what we've seen over and over again is companies come to us and say, we tried these generic models and they don't meet our design bar, they don't follow our style, they don't follow our brand guideline. And once we train custom models for them, they are like, wow, this understand my brand DNA. Now we can use this for design ideation or we can use this for marketing. And we thought with open rate release we can give a glimpse of customization to developers within enterprise and kind of scale this side of our business further. And we're really excited about that.
[23:57]
Justine Moore
For enterprises, as you alluded to, I mean, customization is really top of mind, whether that's a brand kit or it's something that is stylistically just them, but hard to encode that style into just a doc. So I imagine customization on top of an open source model, which is the best way to go. So the question becomes like, what is the ramp up for the customers or the artists to start post training or fine tuning on top of the ideal brand model? What do they have to do?
[24:26]
Mohamed Noorouzi
So one thing I should say is we will work with the open source community to make it as customizable as possible, but because we are the model developer, we have some secret sauces that can make it even better. So what will happen is there will be different ways of customizing. One is in the open source based on the quantized model that's already released. The other is we already have a product that allows you to customize by just uploading certain number of images to our custom model training app. And we haven't released the 4.0 version of that yet, but we are hoping to release that as well. And then the kind of third category is when enterprises work with us and we really describe these detailed prompts for every image. We worked with their design team to understand what words they want to use because each team has different set of keywords. Each company may have certain mascots who have certain names. And so our annotation team gets involved and spends a lot of time curating and cleaning data. And we think depending on your size and your budget, you should still be able to customize the model, maybe use the open source at the low Budget and then you can come and talk to us so that we can build a model for you at the high budget. But then depends, really the ROI that you have in mind for your models.
[25:56]
Yucca Lee
One of the things I think people have been talking about a lot, both for enterprise and actually for consumer, is fine tuning versus image editing. I actually think they don't necessarily have to be competitive. Like some people use image editing as a way of fine tuning. Like they say, take this image and put it into this style. Others think it's much more efficient and consistent to just fine tune a model to generate in that style. I know you alluded to wanting to release an editing model further down the line, so would love to hear how you guys are thinking about that.
[26:26]
Mohamed Noorouzi
Yeah, I agree. I mean, I think editing is very powerful. We both agree. We all agree. When editing launched last year, so many new possibilities opened up. And the nice thing about editing is it's quick. You don't have to train a model. You just take a style or an existing image and you make some changes. And it's part of your iterative workflow. Because with every creative that we worked with, it's never one shot prompting. It's often okay, you get something and then you're gonna go and fix certain details after the first generation. Now that's for editing. But then customization gives you really freedom to not prompt at all, right? Because sometimes it's very hard for you to say, oh, I want to get inspired by this single image and edit that one. You may have some general style in mind and want to ideate in the context of that general style. Or you may have a character that has many detailed degrees of freedom or characteristics. The left side may be different from the right side and like certain outfit. And it's very hard to really put all of those images as the input to your editing model, and it often fails. So we think customization can give you a lot more powerful adherence to your characters and allows for an easier iteration and ideation. So I agree with you. I don't think they are mutually exclusive, but they're both very powerful.
[28:03]
Justine Moore
With the JSON prompting and editing and model fine tuning, the composability aspect of the model is just huge. There are so many ways you could customize it. One hot topic in the industry, in the research community is agentic loop for creative tools. So it used to be the creativity tools. The consumption layer is always a ui. As a human, I look at it and then I make modification. Now so much of that may become like an API request like the agent makes. How do you see that? What would the API entail compared to how humans use it?
[28:43]
Mohamed Noorouzi
On your earlier point, I want to say something which is we seem to compare image generation with language models a lot. And for example, in the language model space, even though customization exists, it's not that, like every company customizes their language models. But I think that actually misses the point. When you look at visual representation of a brand, you immediately recognize the differences between brands. But if you look at the written communication, can you say, oh, is this Andreessenhorvis or this is Sequoia? I mean, you probably can tell.
[29:20]
Yucca Lee
I hope so. I get the point.
[29:24]
Mohamed Noorouzi
But most people will not be able to immediately look at the text.
[29:28]
Yucca Lee
That's true.
[29:29]
Mohamed Noorouzi
And then say, so there is a lot more diversity in the visual world and that's very exciting for customization. And there are a lot of unique ways of interacting with the models. That kind of goes back to your earlier question. A lot of 3D manipulation. So for that kind of use case, the input will be some 3D representation of the joints or position of the objects, then you may have a completely different, you know, stylistic variation with the style being the input. So there are a lot of different types of interactions that you want to enable. And for that reason, it's much different from the language space where the input is always text and like, you can kind of more or less convert everything to text. We are so excited about agents. We have our own mcp. We use them a lot internally. What's really exciting is when you want to release a new feature, you can go into your agent and then ask it to connect to the API and generate a bunch of images, and then you can go and find the best ones. And like, in a couple hours, you have your landing page up and running. So we are very, very excited about agenting workflows. We think there's this just at the beginning, to your point, we need evaluation as part of the loop. We don't want to be to have to look at every image, and then editing will be part of the agenting interaction too. How we exactly want to compose these different pieces to accomplish a goal is still to be discussed. But we have API and we have mcp and we really want to enable the agentic workflow. And we think every company is trying to figure it out as well. So we're really excited about that.
[31:17]
Justine Moore
I guess, for the API business, so much of a design is iteration. It's a long tail of a design. It's no longer Just you prompt something, you get an image and call it a day.
[31:27]
Yucca Lee
Right.
[31:28]
Justine Moore
It's so much of a get an image, use the edit model to edit it, see if it works well. If it doesn't, get another image with the JSON flop, which is easier for control. What are the Net new use cases you have seen after launching the model of how people do compose different API calls on the ideal grammar?
[31:49]
Mohamed Noorouzi
What's interesting is yes, we have these agents, a lot of them live in a chat bot and that's not enough for iteration, unfortunately. So I think what's unique about that, you can really scale creativity, like kind of give it some high level direction and ask it to go and explore many different approaches and come back with hundreds of thousands of designs that can be easily, you know, looked at and then you get a better sense of, okay, I want to explore more in this direction. So the language model interaction allows for kind of very large scale exploration of creative possibilities. But once you know what you want, you need a ui, you need a UX to be able to go and edit. And whether it's regional editing or text based editing, I think at the end of the day you want your canvas, you want to be able to point to things and then you want to also talk to it with natural language. And it's actually very hard work because models are changing and now you're also designing the user interface at the same time. So kudos to the best designers who understand how these models work and are trying to figure this part out. There's still a lot of work to do in terms of the things we've seen from the model. Again, design is something that's coming up a lot. And I saw a tweet before coming here that somebody said, I have no design training and I got this design in two minutes and it actually looked really, really nice. That was one example. And then people are really excited about the art possibilities because this model was trained with very unique style descript. As part of training, we actually stripped that from the JSON prompt because it became too much. But the model has a lot of artistic possibilities and like many different styles are embedded into the model. And if you've seen some of the frontier models actually that score very highly in the leaderboards, they don't have a lot of kind of design variation. They always produce the same exact look. And I believe that's because they did a lot of reinforcement learning. Training actually have done very little reinforcement learning. So this is a very raw model. Now with that, the outcome is you need to Be much more precise with your prompting. But you can get a lot of different styles from the model, and people seem to be very excited about that aspect of the model, especially in the art community.
[34:38]
Yucca Lee
I think when you talk both about the design and the art, it really brings back the taste point you made earlier. Because for so many sorts of designs, you're trying to communicate some sort of idea, whether it's like an infographic or an ad or a logo or whatever. And it kind of needs to stand out and, like, be distinct to you or have a unique style. And I've totally noticed what you said a lot of the frontier, the like, historical frontier image models, like, you're scrolling the feed and you just see, like, I've now seen this style, like 50 times. I've seen it a hundred times. It doesn't, like, catch my eye anymore. And it feels like now when I prompt ideogram 4, I often get something that, like, makes me stop and be like, wow, this is different than anything I've seen before coming out of an image model. And, like, this is doing an amazing job of both communicating what I want to communicate and also holding someone's attention.
[35:29]
Mohamed Noorouzi
Yeah. So we tried to be enabling very different styles, and that was one of our goals. We still want to produce tasteful output, but that doesn't mean we have to force a complex output to you. If you want a minimalist design, you should be able to get that. Actually, our minimalist is too minimalist, in my opinion. I was saying we should ban the word minimalist from the output, but you get what I mean. It's like the model can do many different things, and that's by design.
[36:04]
Justine Moore
We know what's the first ideogram model we're going to post train. It will be a. A16Z marketing brand style. Art deco. A16Z art deco.
[36:12]
Mohamed Noorouzi
Yeah.
[36:12]
Justine Moore
Let's do it for the new branding.
[36:15]
Mohamed Noorouzi
Yeah, let's do it. That's very exciting.
[36:17]
Justine Moore
Yeah. So I guess the question also becomes like. I kind of asked you the representation question before, which is, here's a JSON representation. As an artist, obviously if you abstract it out far enough, all the lines are pixels. So you could say that the composability is on a pixel level, which is actually different from the diffusion representation. It's like denoising and they operate in a different space. Where does this lead to? If you travel down the JSON path granular enough, does it lead to pixels? Does it lead to SVGs? Does it lead to language or something else?
[36:53]
Mohamed Noorouzi
That's a Very good question. So in general, the recipe for building more powerful models, in my opinion, is making the task as straightforward as possible for the diffusion model, I.e. specify the exact details of the image. And so now if you kind of make that extreme, then it becomes the pixels themselves, so the fusion model doesn't have to do anything. Now the catch is, what we would like to do is to get the language model to produce that intermediate representation. And language models, as of now, they aren't very good with continuous output, they aren't very good with kind of pixel values or very high dimensional vector representations. So I guess the constraint here is that the representation has to be tokens of like, I mean, depending on your language model power, maybe it can be a million tokens, maybe it can be like, that's too extreme for us, it's about 4,000 tokens. And that's where we still use natural language, because these large language models are trained with natural language, so they are very good with natural language. But it may become more close to HTML, for example. That's okay because again, large language models are trained with HTML and they know the tokens. But would you have to design your
[38:14]
Justine Moore
own version of HTML or would you align to HTML?
[38:17]
Mohamed Noorouzi
It's actually kind of alluding to the editable model that I'm talking about. And we've had a lot of back and forth. Are we going to have our own JSON for different, you know, text elements and buttons and stuff, or are we going to use HTML? And seems like HTML makes more sense just because these large language models have already been trained on HTML as opposed to us introducing a new JSON structure. But I would say to answer your question, that representation needs to be easy for the language model with the particular design that we have right now, which is a language model, does some expansion of the ideas, and then the image model takes those expanded descriptions and turn them into images.
[39:05]
Yucca Lee
We'd love to hear if there's anyone who's interested in working at Ideogram or working with you guys as a customer or to fine tune a model. What's the best way for them to get in touch with you and the team.
[39:16]
Mohamed Noorouzi
First of all, we would love to work with more engineers, cracked engineers. We have a very tiny team. You see what we were able to produce. It's such a tiny team. And if you want high agency, if you want your work to matter, and you want to be part of the academic and open source ecosystem, then this is the perfect time to join us. Now, in addition, enterprises see the potential and would like to work with the most creative brands out there to help them produce the best designs, produce most provocative ads. And also we would like to partner with other startups or other companies at different levels of the stack. This is open weight so we can make it win win. And we would love to offer a different option to companies who want more control and data privacy and sovereignty. So we would love to work with other enterprises as well across the stack. And then the best way would be like we have this email partnerships at ideogram. You can DM me on Twitter or LinkedIn and I'm very active on both
[40:34]
Justine Moore
platforms today if I want to fine tune my own style. Where should I go on Ideogram?
[40:39]
Mohamed Noorouzi
Just come to us.
[40:41]
Justine Moore
Is there a call to action? We should tell people
[40:45]
Mohamed Noorouzi
there is actually Model tab. If you log into Ideogram, there's a model tab and then you can go and upload your images and train your model. It's a little more expensive, it's 60 bucks for two model training per month. But we think for professionals that's totally worth it.
[41:02]
Justine Moore
Absolutely. And how many images do you need to get started?
[41:06]
Mohamed Noorouzi
I think you need at least 15.
[41:09]
Justine Moore
So for an enterprise, if they want to fine tune their own model on Ideogram, like what's your guidance on what they should upload on Ideogram to start fine tuning the model?
[41:19]
Mohamed Noorouzi
I think for enterprise again we have some sales forms they can fill and then come talk to us because we see that there are many differences in what different companies want. Some companies care more about editing, some companies want more like on the marketing side to automate some of their ads. And we should talk first and then figure out what's the best solution for that.
[41:45]
A16Z Podcast Host
Great.
[41:45]
Yucca Lee
Awesome. Thanks so much for joining us.
[41:47]
Mohamed Noorouzi
Thank you so much.
[41:50]
A16Z Podcast Host
Thanks for listening to this episode of the A16Z podcast. If you like this episode, be sure to like, comment, subscribe, leave us a rating or review and share it with your friends and family. For more episodes go to YouTube, Apple Podcasts and Spotify. Follow us on x16z and subscribe to our substack@a16z.substack.com thanks again for listening and I'll see you in the next episode. As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to to our investments, please see a16z.com disclosures.