Summary7 min read

Podcast Summary: "Nobody wanted to do this work": How Emmy Award–winning filmmakers use AI to automate the tedious parts of documentaries

Podcast: How I AI
Host: Claire Vo
Guest: Tim McLear, Producer & Technologist at Ken Burns’ Florentine Films
Date: November 17, 2025
Length: ~48 minutes

Overview

The episode explores how Tim McLear—an Emmy Award–winning producer at Florentine Films—uses AI not for creative content generation, but as a practical tool to automate the "technical mess" of post-production in documentary filmmaking. The conversation is a deep dive into building custom AI-powered tools that automate data entry, image and metadata management, field research workflows, and archival document transcription, freeing creative teams from the drudgery of tedious, manual work.

Main Theme:
Using AI to automate, organize, and optimize the vast, repetitive tasks behind the scenes in documentary production, enabling teams to focus on research and storytelling.
Purpose:
To demystify how non-engineers and creative professionals can practically apply AI in highly specific, impactful ways—beyond flashy generative applications—by building or leveraging their own workflows.

Key Discussion Points & Insights

1. The Problem: Post-Production is a Data Mess

Timestamps: 00:00–04:45

Documentary filmmaking involves handling hundreds of hours of footage and tens of thousands of images, which must be meticulously logged, tagged, and made searchable for future use.
Manual data entry ("logging") is time-consuming and widely disliked.
The shooting ratio for documentaries can be extreme. Example: For an 8-hour Muhammad Ali series, over 20,000 stills and 100+ hours of footage were gathered.
- Quote:
  "Post production is like a technical mess of media management. ...The data management piece...is the mess that I have used AI to tackle."
  (Tim, 00:08)

2. From ChatGPT to Custom Automation: The Evolution of the Workflow

Timestamps: 05:05–15:15

Initial Inspiration: The arrival of ChatGPT’s image upload was an "aha!" moment, opening doors for automated database entry.
- Quote:
  "It was this insane day for us...an aha moment where it was like, ‘Oh my god, this thing can see.’"
  (Tim, 05:21)
Early Days:
- Started with simple scripts to generate descriptions of images via OpenAI’s vision models.
- Quickly realized that generic AI-generated descriptions weren’t sufficient; context (e.g., exact location, date) was missing.
- Solution: Designed scripts to first extract embedded metadata and append it to AI prompts—elevating both accuracy and richness of database entries.
  - Quote:
    "When you give [AI] the tools and information to write a better description, it’s gonna get there."
    (Tim, 10:46)
Choosing Models:
- Used OpenAI for vision/image analysis (first to market with vision APIs).
- Used Anthropic Claude models for coding via automated prompting in tools like Cursor.
- Quote:
  "All the vision preview API calls were already there, and the switching costs were too much."
  (Tim, 12:24)

3. Scale and Sophistication: Building a Custom, Extensible AI Logging System

Timestamps: 12:41–21:22

Current State: A REST API called “Autolog” handles multi-step metadata extraction for any file—stills, video, music.
For images:
1. Gather file specs
2. Move/copy files
3. Parse for metadata
4. Scrape web for additional info
5. Generate robust, AI-assisted descriptions
For video, uses frame sampling (at 5-second intervals for efficiency), extracts key stills, audio transcripts (with Whisper), and aggregates all into detailed, prompt-based reasoning for AI models to produce descriptive events/summaries.
- Quote:
  "For the frame captions themselves, I’ll use a cheap model… But then all the data goes to a reasoning model to get what’s happening in the video."
  (Tim, 15:25)
Embeddings power semantic search, making archives vastly more discoverable:
- Uses CLIP for image embeddings, OpenAI models for text, then fuses them.
- Editors can now "find similar" images or assets by vibe/similarity instead of crude keyword matching.
  - Quote:
    "Now the ability to discover semantically is, I think, the most robust part of the system."
    (Tim, 19:38)

4. Automating Field Work: The ‘Flip Flop’ App for Archival Research

Timestamps: 24:44–31:35

Pain Point: In archives, teams take massive numbers of iPhone snaps (front and back of images) that used to be an organizational nightmare.
- Prior method: chaotic camera rolls, hard-to-match image pairs.
Solution:
- Tim "vibe coded" (AI pair-programmed) an iOS app called Flip Flop.
- Lets users create collections (folders), snap fronts and backs, associate pairs, and embed OCRed backs as EXIF metadata in real time.
  - Quote:
    "Now anytime anybody uses one of these images…that image is embedded with that metadata."
    (Claire, 29:19)
- Drastically streamlines organization and retrieval—even for people outside the app ecosystem.
  - Quote:
    "I had two colleagues out in the field a couple weeks ago…they came back with 1400 images…Flip Flop is certainly making the process easier."
    (Tim, 29:33)

5. AI for Archival Document OCR: 'OCR Party' App

Timestamps: 32:03–36:44

Problem: Transcribing only portions of historical documents (old newspapers, handwritten letters) is tough; generic OCR is often not accurate.
OCR Party: A custom Mac menubar app lets users crop the document, choose between Apple’s vision OCR or an AI API, and extracts just the needed text—including challenging sections.
- Also highlights AI's strength in reading cursive, noisy scans, translation, and partial documents.
- Genealogy use case: Claire’s mother uses similar workflows to transcribe historical names from old documents, which AI excels at.
  - Quote:
    "AI is really good at OCR of old documents. It’s really good at handwriting. It’s pretty good at translation, too."
    (Tim, 32:20)

6. Broader Lessons & Takeaways

Timestamps: 31:35–36:44

Custom AI-Enabled Tools:
- With today’s AI, creative and technical professionals can "vibe code" specific, high-impact personal tools—which would never make sense for a commercial product.
- Apply AI + light software engineering to domains/files you know well.
  - Quote:
    "No one was going to make me this app. And so the ability to make an extremely specific app…has been an unbelievable moment."
    (Tim, 35:39)
AI and File Types: Consider what information can be embedded or extracted from various file types—a new use case for AI emerges when you “load up” files with context via code.

Lightning Round: Philosophy and the Future

Timestamps: 37:46–44:49

Learning & Upskilling:
- Tim draws analogies with learning creative tools (Photoshop, Premiere), seeing AI coding as creative design more than hard engineering.
  - Quote:
    "Coding feels so much more creative than technical...these tools feel really like creation engines."
    (Claire, 39:06)
AI Skepticism & Ethics in Film:
- The industry is wary of generative video’s potential for job displacement and authenticity risks.
- Tim distinguishes:
  - Nonfiction/archival: Should not generate fake footage or mislead.
  - Practical AI tools: Empower researchers and creatives by cutting toil, not replacing core skills.
  - Quote:
    "We should not be generating archival footage. ...[but] there’s a place in the process for it which allows you a place to learn without thinking it needs to end up in the final product."
    (Tim, 40:46 / 43:18)
Prompting Advice:
- If AI is not responding well, start a new thread or ask for a “resume work” prompt to clarify.
- Be polite to the AI; prompts often go better.

Notable Quotes & Moments

"Automate away toil. That's what we want to do."
(Claire, 00:36 & 37:46, recurring theme)
"Nobody wanted to do this work" (regarding manual data entry)
(Summed up in title and throughout)
"I think the best argument I have for all the work I've done...is that the same people who used to write this data were the ones responsible for doing the research. So you've now freed them up to just look more."
(Tim, 22:01)
"I have a button down here where...if I like an image I can click 'find similar' and it's just going to go and find every image that kind of has that vibe."
(Tim, 22:51)
"No one was going to make me this app. And so the ability to make like an extremely specific app that makes a workflow...easier, it's been an unbelievable moment."
(Tim, 35:39)

Section-by-Section Timestamps

Intro and Overview: 00:00–03:49
Motivation & AI’s Role: 03:49–05:21
Early Automation & Demos: 05:21–12:41
Building Out the System: 12:41–19:10
Embeddings & Search: 19:10–22:51
Semantic Search & Discovery: 22:51–24:44
Flip Flop App & Field Work: 24:44–31:35
File Embeddings & Takeaways: 31:35–32:03
OCR Party & Handwriting: 32:03–36:44
Product Philosophy/Learning: 37:46–40:15
Industry Concerns & Advice: 40:15–44:49
Prompting Techniques & Wrap: 44:49–46:24
Contact & Show Info: 46:24–end

Practical Workflow & Tips from the Episode

AI + Manual Guardrails: Always fuse AI’s generative power with hard metadata or source context to boost trust and accuracy, especially for archival/historical work.
API-first Automation: Even "non-engineers" can build powerful workflow automations via API, AI script-writing, and micro-apps.
Use ‘Vibe Coding’: Design UIs and features by “speaking in screens” and partnering with AI coding assistants.
Think Custom: If you have a repetitive task nobody wants to build a product for—build your own AI-powered micro tool!
Embed Metadata Early: Automate embedding of all critical info into files at the moment of capture, not downstream.

Conclusion

Tim McLear provides a masterclass in “AI for real work”—demonstrating how filmmakers and creatives can use today’s tools not as replacements for human creativity, but as crucial force multipliers for the tedious, time-sapping, and universally dreaded parts of research and post-production. The episode is both practical and inspiring, emphasizing that anyone can leverage AI coding—even in highly specialized fields—to free up more time for meaningful, creative human work.

For more:

Tim’s site: timmacular.com (features his own AI chatbot "GP Tim")
How I AI Podcast and episode archive: howiaipod.com

Ideal Listener:
Anyone overwhelmed by digital assets, curious about practical AI, or itching to automate away tedious work—especially in creative or research-heavy industries.

Loading summary

Transcript87 lines

[00:00]
Claire Voe
How did you think about what problems there were to solve in AI relative to your job and the people that you work with? And why did you start where you started?
[00:08]
Tim McLear
Post production is like a technical mess of media management. You have many different file types, you have images, you have archival footage that you're gathering, live footage that you may have filmed out in the field, interviews, transcripts. So it ends up being hundreds of hours of footage, tens of thousands of photos. The data management piece, when you're dealing with all that different stuff, is the mess that I have used AI to tackle. My goal was to automate this. For years this has been manual data entry.
[00:36]
Claire Voe
Automate away toil. That's what you wanna do.
[00:38]
Tim McLear
No one was gonna make me this app. And so the ability to make an extremely specific app that makes a workflow on my team and my company easier. It's been an unbelievable moment.
[00:53]
Claire Voe
Welcome back to How I AI. I'm Claire Voe, product leader and AI Obsessive. Here on a miss. Help you build better with these new tools. Today we have Tim McLear, a producer at Ken Burns Florentine Films who's responsible for the technology and processes that bring these amazing films to life. Instead of focusing on how AI can create creative for these films, we're actually going to talk about how Tim uses AI to build software products that make his post production and research teams lives a lot better. If you're working with images, video, sound or just a lot of data, this episode is a great one for you. Let's get to it. This episode is brought to you by Brex. If you're listening to this show, you already know AI is changing how we work in real practical ways. Brex is bringing that same power to finance. Brex is the intelligent finance platform built for founders. We with autonomous agents running in the background. Your finance stack basically runs itself. Cards are issues, expenses are filed and fraud is stopped in real time without you having to think about it. Add Brex's banking solution with a high yield treasury account and you've got a system that helps you spend smarter, move faster and scale with confidence. One in three startups in the US already runs on Brexit. You can too@brex.com HowIAI Tim, welcome to How IAI AI. I'm excited to have you here.
[02:27]
Tim McLear
Thank you for having me.
[02:28]
Claire Voe
What I love about what we're going to talk about today is you work in a very interesting and creative industry putting out amazing content and we're going to talk a little bit about how AI is impacting the creation side of things. But you've actually used AI to smooth out some of the challenges you've had on the production and post production side of things. So I'm curious, how did you think about what problems there were to solve in AI relative to your job and the people that you work with? And why did you start where you started?
[03:02]
Tim McLear
Yeah, I think most of the flashiest use cases of AI in creation or media and entertainment right now are often in like generating full video content or images or whatever it is. But post production specifically is like a technical mess of media management. Especially in nonfiction. You have like many different file types, right? And you have images, you have archival footage that you're gathering, live footage that you may have filmed out in the field, interviews, transcripts. And so like the data management piece, when you're dealing with all that different stuff is the mess that I have used AI to tackle. And I think that the sort of like AI as a tool versus AI for generation is even more immediately applicable in our field at the moment.
[03:49]
Claire Voe
Well, and I have a very, you know, very simple, humble little podcast. But even for us, we create a lot of research and, and longer content and we're editing it down. I'm just curious, with documentaries and nonfiction work, what do you think the ratio is of media captured, researched and archived to actually publish? Because that will maybe give us a sense of how much of this you have to grapple with to get a good, good piece of content on the end.
[04:17]
Tim McLear
We have a thing in our industry called a shooting ratio. And so you can imagine in like a fiction series or you know, like a sitcom on air, I don't quite know what those shooting ratios would be, but you're working with a script and so you're going to have a slightly lower ratio in documentary. It can get quite high. Like I can tell you that we made a series about Muhammad Ali a few years ago, is an eight hour show. We gathered 20,000 still images in the database of just stills. I think it was over 100 hours of footage because he had a lot of fights and that kind of thing, news news footage. And then we also filmed, I want to say like 35 interviews for the piece. So it ends up being like hundreds of hours of footage, tens of thousands of photos. And that's just like, that's one example of, you know, a particularly famous individual. But that tends to be what it looks like for our shows.
[05:05]
Claire Voe
So that's what you have to manage, make searchable, make usable by the entire production team. And you got inspired by ChatGPT and some of these early AI tools to do some of that. So you want to hop in and show us what you know the first use case is.
[05:22]
Tim McLear
Absolutely. So I'm going to start by kind of just showing you the like, end result before I go right to like, how I got here. So on any film that we work on, we end up having some kind of database. Right. So this is a database where you can see the still images we've gathered. You can see there's a footage section, a music section, anything that might go into the film and all the kind of stuff you might expect to see. Right. Descriptions, tags, a date on the thing where we got it from. Some more technical detail is also going to appear over here. In any event, my goal was to automate this. For years this has been manual data entry. And so I remember vividly. I'm going to jump into cursor now, but I do remember, like when I first started doing this, it was ChatGPT. I remember ChatGPT added image upload and it was this insane day for us. I was like in the office with my colleague Clark, and we were just like throwing images at it and seeing kind of the quality of the output. Like it was this an aha moment where it was like, oh my God, this thing can see. And how could we harness this text generation to use it for our database entry? So I'm going to simulate the starting point and then we'll jump to where we're at today. But essentially what it looked like at the beginning was we would throw something into GPT and we would say, hey, can you describe this? And it would hallucinate a little bit. But it was so tempting to figure out a way to harness that. I started essentially like writing little python scripts with ChatGPT. And at that time it was like VS code on one monitor and GPT on another. And I'm going to. All right, I'm just going to go ahead and demo what that kind of looked like. I'm going to speak my prompts, if that's okay. I use this tool called Super Whisper because it kind of cleans up my off the cuff dictation. So I have an image here of a nice street in somewhere, America, maybe mid 20th century. We're going to see what kind of description we get from AI. All right, write me a script that submits the JPEG at the root of this workspace to OpenAI for description. I want just a general visual description of what we can see in the image. Any API credentials You need are in a text file at the root of the folder. And what we can see here is that, like, everything I just said got funneled through this app called Super Whisper. So it got funneled through a prompt that itself is cleaning up my messy vibe coding. I think it's clean enough, so we're going to go ahead and submit it.
[07:57]
Claire Voe
And I see you're using Claude 45 Sonnet. Is that by choice or by default, or.
[08:02]
Tim McLear
That is because I'm on a podcast right now. To be honest, this is a very easy task for AI. I could keep it on auto for this. Right. I will say I switch between various CLAUDE models depending upon the, like, difficulty. And I do try and be cheap and stay on auto if I know that I'm asking for easy stuff, you know.
[08:21]
Claire Voe
Okay, so you're just, you're. You're giving us a little bit of quality control here.
[08:25]
Tim McLear
Yeah, I don't want it to mess up. We're live on air, you know.
[08:28]
Claire Voe
Yeah.
[08:29]
Tim McLear
All right, so it's telling me that I need to install some requirements. My guess is I have those requirements. It's got a submit image script. Let's see what it did. Here we go. It's running. Submitting this image to OpenAI for analysis. What kind of. What kind of description will we get? There we go. This image depicts a small rural main street from what appears to be the mid 20th century. We had guessed that there are a series of wooden storefronts, each with signs indicating there are local businesses. Okay, so this is great. And this is kind of what we were getting in those early days of GPT image upload. But the problem here is, like, you're making a film, you want to know what rural main street, what town are we in? What is the exact year? And you can't really just go with this kind of generic description. So a lot of times we happen to know that images come with embedded metadata. And, you know, if you're using your iPhone camera today, you know that maybe there's some metadata, like GPS data, that kind of stuff. But archival images will often come with whatever notes people have scribbled onto them over time. And so I'm going to. Now I'm going to iterate on this one time and say, I want you to add a step to this script. I want to scrape any available metadata from the file first and append that to the prompt. The goal here is that we are using any available metadata as, like, a source of truth for what this image actually is and not just guessing and.
[09:58]
Claire Voe
So just repeating that while this is running, what you're saying is for, for this particular use case, you're working with a set of archival photos from sources that have embedded probably additional layers of metadata into it that you can read that give more information, which is different than, you know, scanning something or taking something off your, off your phone, which I think we're going to look at a bit later. So you're trying to harness the structured metadata off this file, which, if you go back to the tab that shows the image we can't see with our human eyes, but our agent friends can read with its robot brain. And you're using that information to then upgrade this script that is going to do all this AI analysis for you.
[10:47]
Tim McLear
That's exactly right. And so in this case, it's going to be embedded metadata. I happen to know this is an image from Library of Congress. There's going to be some metadata on it, but it could also be something on the web. Like where this eventually goes to is like, okay, I know that there's a website with information may not be in the file, but hey, how about you go and scrape the web, gather anything you can know about this. Because ultimately, like, this is a journalistic endeavor. These shows get fact checked. We want everything going into our database to be true and verifiable information. All right, so let's see how it did when it added that metadata check. So we can see in the console, it did a little bit of a scrape. It looks messy as hell, but somewhere in here we can see stuff like, yeah, archival information. And it's now going to use that. And what we've generally found is that when you add those guardrails, when you give it information, you know to be true about the image, it relies on that so much more than just what it can see. Like, you know, AI really wants to perform for us. It really wants to do a good job. And so when you give it the tools and information to kind of write a better description, it's gonna, it's gonna be able to get there.
[11:58]
Claire Voe
And I wanna call out some things. So we talked about using the anthropic CLAUDE models in particular for the actual coding of the script. But you're relying on the open AI models for the image analysis. Why OpenAI versus any other models that, like, stick with the one that you love, or it was the first one that did a good job for you, or do you feel like it's particularly good at image analysis? I'm curious why you select those different models for different Use cases.
[12:24]
Tim McLear
Yeah, it's mostly that. It's the first one, like they were the first one who had. They had a vision preview on their API. They did it before Claude and like I had built up enough of an infrastructure using that API call that it was like the switching costs were too much, you know. Yep. All right, so let's see what we got this time.
[12:42]
Claire Voe
It's much more detailed.
[12:43]
Tim McLear
It is, it's much more detailed. So the image shows a street scene on the main street of Cascade, Idaho. There we go. We know where it is now. Captured in 1941 by photographer Russell Lee. We've got photo credits. All right, so this is a great example of like you add the guardrails and you're going to get more detail, but you're also just going to get facts right before. I don't know if it's still up here somewhere. Yeah, before it was a small rural main street. Now it is the main street of Cascade, Idaho. We can imagine this getting duplicated in various ways. This image has embedded metadata. Maybe it's a website that we're going and gathering it from. But effectively this is where it all started. It started with a single Python script that I was running on my computer and I was like, this is awesome. My database software is advanced enough to call external scripts. You can kind of use any database to do this, airtable, whatever, but you just need something that has an API and that can call an external script or webhook or something. So this is where we started. And now I'm going to switch my screen share to a remote machine, like a little Mac Mini that I have running in my office. It's hard to at this moment. It's a more complex cursor workspace. You can see. Maybe I'll bop into the rules. Basically what this is is a REST API, so that every image file, video file, music file, anything that ends up in that database that we looked at at the beginning pings off of this REST API for all kinds of different, like, metadata tasks. If I pop into the jobs folder here for a second, you can, we could zero in on like basically what we were just doing, but the current iteration of it, so I call it Autolog because the process of writing this in for years, the, the, the manual data entry is called logging. So it's not the cleverest name, but it fits. And you got a five step process here, basically. First we're going to gather the info, meaning file specs, how big the image? Is it a jpeg, is it a tiff? We're going to copy the file to our server, we're going to name it our ID number, we're going to parse it for metadata. Is there any metadata? If there is, great. But either way, we're going to look for more information on the web in this step four here. Scrape URL and then once we know everything we could possibly know about that image, we're going to generate a description for it. And when you imagine how this might work for video, well, like video is itself, it's just 24 images in a second, plus some audio. And so basically this just gets scaled up to deal with video files too.
[15:16]
Claire Voe
Are you using the same model for video files? Are you taking them extracting the stills and putting them through OpenAI or using a different model?
[15:25]
Tim McLear
I use a different model for. So I have the video files requires like two levels. Most video, like AI models out there seem to do basically some version of frame sampling. So it could be extremely expensive if you were sending all 24 images every second to an API. Right. So I pull at five second intervals because I'm cheap. Some others maybe pull in a more in a smarter way, maybe at like lighting changes or something like that. Like there's different ways of thinking about the frame sampling. So for the frame captions themselves, I will use a cheap model. I'll use like a nano GPT5 nano. But then for the. And I can go in and show you a prompt here which maybe illustrates this. I have frame prompts which basically ask for just like a prompt of an individual still image extracted from video, but then I have a larger parent prompt. You can see that my prompts have gotten slightly more sophisticated over time. Basically what this does is it sends every single frame that we've extracted from a video file. It extends anything like any of the audio we've transcribed from that video file. It packages it up into this elaborate prompt and it sends it to a reasoning model. And the purpose of that is to say these are all the video events that we have observed in this video here is like a massive text file of data. Tell me what you think is happening in the video.
[16:52]
Claire Voe
Got it?
[16:53]
Tim McLear
Yeah, yeah.
[16:54]
Claire Voe
I, you know, maybe, maybe tip from one of our other how I AI guests. But I found that the Gemini, the Gemini models are quite good with video. It's actually what we use to do our podcast raw recording to both highlight stills and a blog post that I put out. I process them through the Gemini models and have had a lot of success.
[17:17]
Tim McLear
And it just pulls out like the stills that might Be it just, it.
[17:20]
Claire Voe
Automatically pulls interesting stills. It actually gives me interesting stills + 5 seconds or like + 5 seconds? Plus minus 5 or minus 5 seconds. Because sometimes the guest and I are looking ridiculous.
[17:32]
Tim McLear
Yeah, yeah, yeah, of course.
[17:34]
Claire Voe
So tip to anybody out there with video who hasn't tried the Gemini models. I find those particularly good for this use case.
[17:41]
Tim McLear
You might have just, you know, added something to our little roadmap here.
[17:45]
Claire Voe
Well, and so, and then I'm curious about the audio side of things. So I kind of, you know, I play with the Gemini models for video. This still makes tons of sense to me. Tell us a little bit about the audio side of things.
[17:58]
Tim McLear
So the audio is also I now I feel like I'm an OpenAI shill. Everything I'm using is OpenAI and I think except for the coding, which is interesting, but I think it's just habit. I use Whisper for audio. So like Whisper is an incredible open source model for speech to text detection. Even the like medium sized model does a pretty good job. And what I do is, and I can pop back into the database software maybe to like illustrate this. What I do is I extract, you can see like frames pulled every five seconds and there's a caption associated with each frame and then there's. This is a shot of an alligator in a swamp. So he doesn't have any audio, he wasn't talking. But I basically pull audio at 5 second increments so that when we send those like video events up to the reasoning model, we are sending a full transcript, but we're sending it like kind of like pegged to the moment in the video that it happened, if that makes sense.
[18:51]
Claire Voe
Yep.
[18:52]
Tim McLear
So the transcription is all happening, you know, on my back end over here. Everything. Like I think I could probably open up the console and see like there we go. Like someone just sent a job through not that long ago. Like I can kind of come in here and see what my colleagues are doing as they ping my API all day long.
[19:11]
Claire Voe
Great. And so you're pairing a snapshot image every five seconds from a video, the five second transcript of the audio speech to text via Whisper metadata, if you have it, parsing that all together and then getting a very robust description and analysis of the content that you have available in back in this tool that you're using to archive, log, manage all, all your assets.
[19:39]
Tim McLear
Yeah, and like I said, that tool could be kind of agnostic. Like you could do it in a Google sheet if that's, you know, if that's what you like. But I Like this. We've been using it for a while. Everything we just talked about is how we kind of get to like, metadata that we can read, right? Like generative metadata. That is a. We know it's accurate because it's kind of been put on these guardrails by our metadata extraction steps. And then also it provides this, like, nice visual for us. We can see what this thing is at a glance. But the next step of this, now that you have this like, API running in the background, is you can generate something that maybe I can't read, but the AI can read pretty well, which is vector embeddings. So I'll jump back to stills for this because I think it's maybe an easier illustration of it. Every asset in our database gets put through two modes of embedding. We'll send the thumbnail through and run it against an open source model. I use clip for this and I'll generate an embedding off of that and then we'll send the description through. I use again an OpenAI text model for this and get an embedding for that, and then we'll fuse them. And the purposes of that is that. So now we have the, the ability to discover things semantically. Like prior to this, and I think in a lot of film production today, you're working with exact text search. You know, like, if that description says dog, but you know, somebody wrote in puppy, you're not finding that image. And so this has been like, kind of the most exciting part of it. Not necessarily where I knew it was going when it started. Like, I was just excited to generate a description, right? But now the ability to discover semantically is, I think, you know, the most, the most robust part of the system.
[21:22]
Claire Voe
So what I love about this, I mean, a couple of things, is one, you've really pushed every step of the way. You know, you could have stopped at like, we got good descriptions or we got like the structured metadata out, and now I have a script that runs it. You could have stopped at images only, but you took it to video and video and audio. You could have stopped at structured data only, but you went to embeddings to get semantic search. So I love just the breadth of applicability of the AI in this process. But what I probably love more is I doubt this was anybody's favorite part of their job. Like, I doubt it was anybody's favorite part of their job to be like, I'm going to go read some Library of Congress meditate.
[22:01]
Tim McLear
It used to be my job. So I Can tell you firsthand. Not my favorite part. And it's also like, I think the, the best argument I have for all the work I've done creating this system is that like the same people who used to write this data were the ones who are responsible for doing the research. So you've now freed them up to just look more, right? Like maybe now we could gather 25,000 still images for the Muhammad Ali project, because you have that much more time. You're not just like copy and pasting stuff off a website to put it in this form, you know well.
[22:30]
Claire Voe
And you probably get to select from this big archive of data better assets to use in your content because they're more discoverable, because you have more confidence in the source and the content of that data. So I bet it up levels at the end of the day, the quality at the end because you have just much more data to work off of.
[22:51]
Tim McLear
100%. I mean, like a real quick example of that too is like, I'm going to use Abe Lincoln here, which is maybe not the best use of this image. But embeddings enable us to find things in ways we never would have thought to find them before. So, like I have a button down here where when I click it, what it basically is going to do is reverse image search within our own collection. So if I'm an editor and I like an image and this is going to take a while because I'm not on site, but if I like an image, I can click the find similar button and it's just going to go and find every image that kind of has that vibe. You can see here we have a duplicate of this one, but then there you go, it recognized the man and it started pulling in other portraits.
[23:31]
Claire Voe
This episode is brought to you by Brex. If you're listening to this show, you already know AI is changing how we work in real life. Practical ways. Brex is bringing that same power to finance. Brex is the intelligent finance platform built for founders with autonomous agents running in the background. Your finance stack basically runs itself. Cards are issues, expenses are filed and fraud is stopped in real time without you having to think about it. Add Brex's banking solution with a high yield treasury account and you've got a system that helps you spend, spend smarter, move faster, and scale with confidence. One in three startups in the US already runs on Brex. You can too, @brex.com howiai I love this. Okay, so this is more of your archival and footage data, but you capture a lot of Stuff in the field where people are not sitting in front of Cursor or their desktop looking through these assets. And I know that you use some Vibe coding and a creative approach to get more information about those assets. Could you walk us through that?
[24:45]
Tim McLear
Yeah. So the next use case is an app that I developed for archival research in the field. So I think that we really pride ourselves on like turning over every rock on, not just relying on what's digitized and available online and going and visiting physical archives. And so the process of visiting a physical archive is basically, you have a bunch of folders that you pull ahead of time. You arrive there and your goal is just to snap, like low res resolution iPhone snaps of everything you can possibly get. And so you're snapping the front of the image and you're snapping the back of the image, because the back is typically where there's going to be like a scrolled description or maybe like an accession number, an ID number that the archive has added themselves. And so this process used to look like you show up at the archive, you take iPhone snaps for two days, you get back to the office, you have the messiest camera roll you've ever had. You cannot actually pair your fronts to your backs because it just got out somehow. It got out of order along the way. And so the goal was basically to make that process a little better. So I Vibe coded this iOS app to deal with this problem. And I, I tend to just like speak in screens. Like the way, maybe it's because I'm a visual person. Like, the way I deal with it is I just think like, okay, I see a screen that does this and a screen that does this. I imagine a button that does this. And the purpose of this was basically like, I want people to be able to create collections for each folder they're capturing. I want them to be able to snap a front and a back, like the flip side of the image, so that they can easily associate those. So the file names associate them. And I want to immediately transcribe any information on the back and embed it into the original image. So now I have this app called Flip flop. I ask ChatGPT at the end of my dog walk to generate some kind of specs doc or requirement doc. It pretty much does it in one go. If you chat with it for 30 minutes, you know, you can get a lot done. And then I fed this PRD to Claude code and it, this one, it like it. It didn't build it in one shot, but it certainly built the UI in one shot. And so I guess maybe we should just jump into, like, the actual app.
[26:57]
Claire Voe
Yeah, let's do it.
[26:59]
Tim McLear
So Flip Flop, which is my cute little name for it, is basically designed to capture those fronts and backs that I was talking about. So you have three screens here. You've got a collection screen where you're going to create your folders. You've got a capture screen where you're going to take your images. And I'll just quickly highlight this part, which is where you kind of have your AI processing options. So I allow people to define a separate prompt for what I call the flip side of the image, the front and the flop side of the image, the back. And so in this example, I'm going to show you some photos of my dog now, and the flop side of the image is going to have some text on it. So our prompts here are really just designed to get a decent caption from the image and to transcribe any text that we see on the back end. So let's create a new collection. We're going to call it How I AI. That's good enough. There's also an option here to add more context. You know, the AI loves context. And so maybe if you're, you know, you can imagine if you're digitizing an entire collection of, you know, someone's personal letters or someone's portrait photographs, you would add that kind of thing here. But for now, we're just going to create a collection, tap into that collection and capture. So here we go. It's a screen share within a screen share. We're going to not care about the glare too much. I'm going to capture the front side of this image of my dog Tony's third birthday. I now have the option to add notes if that's what I want to do. Or I could just add a flop side of the image right here. And when I complete that, it will have, because it's lightning fast, already sent it up to OpenAI for a description and embedded it. And this is the really crucial thing, because you just saw the first system I had embedded it in the image metadata itself. So the flop details have the transcription Tony's third birthday, and all of that will show up in the, what we call EXIF metadata, which is just the image metadata standard.
[28:59]
Claire Voe
Got it. And just for people that may be passed by, instead of simply generating kind of the text description and stor at in a database relative to the original image you took, you actually now have this structured metadata on the image file itself, which again, like What a pain.
[29:17]
Tim McLear
Oh, a giant, giant pain.
[29:19]
Claire Voe
Yeah, a pain to do manually. And so now anytime anybody uses one of these images, even if they don't have access to this, this app, even now that, that, that image is embedded with that metadata 100%.
[29:34]
Tim McLear
So you could pull this onto any computer or any app, anything that can read underlying metadata, and it's going to be able to see that this was Tony's third birthday. So that's structured metadata in the sense that we've now structured the actual information about the image. But the other thing that's really crucial, honestly, is that we've structured the files themselves, right, so you can see they're getting named in a particular way. And so we've moved from like camera roll message to like files that are going to sort in your computer that you're going to be able to import cleanly, you're going to be able to distinguish easily what's the front of the image, what's the back of the image. And that has, I think, been the other unlock. Like I had two colleagues out in the field a couple weeks ago and they came back with 1400 images. And I don't think that's only because they were able to use Flip Flop to capture it, but I think Flip Flop is certainly making the process easier since they've gotten back.
[30:26]
Claire Voe
The thing that I want to call out for folks, maybe a general takeaway here is these AI models are so good with files and code can do a lot of stuff with files. And a lot of the people we talk to markdown is the file type du jour these days, which is just specially formatted text document. But if you start to look at other file types and really understand what can be put in a particular file type, you can actually discover some pretty interesting things you can do with a combination of AI encoding to make those files much more useful for your use case. So this is one of these takeaways where I'm like, I haven't thought about like what can be embedded in an image file or what can be embedded in a video file. And even just having, you know, ChatGPT or one of your general models say, hey, I'm working with an image, how can I load it up with as much context and specificity as possible, what's available to me, and then using that as a jumping off point for what you do is a pretty interesting use case of AI.
[31:36]
Tim McLear
I didn't even know, like, I'm very familiar with Still's underlying metadata fields, but I didn't really know what was available in audio or what was available in video files. And I just sort of. I go into cursor and I ask, right, like, now, where you have a music workflow, which we're not going to look at, but, like where we embed artist album kind of like licensing data into any music we consider for film. And I didn't know that there was in the metadata field we could just store that in. But of course there is. You know, somebody thought of this a long time ago.
[32:04]
Claire Voe
Yep. Amazing. Okay, we have one last use case, which, mom, if you're listening, I think you're gonna like this one. My mom's a genealogist, so I think she's gonna like this, this use case. But let's show it first, and then I'll call out mama, where I think you can use it.
[32:20]
Tim McLear
Okay. All right. So you can imagine in our films, we work with a lot of documents, and we're not always interested in the entire document. Sometimes, like, we just want to transcribe maybe part of it. Maybe we want to translate and transcribe part of it. Like, take this newspaper document, for instance. Like, maybe the Arkansas State News is the article we're interested in. That's the transcript. We want to be searchable. That's what our editor might want to consider for the film. We can't just, like, put this in Adobe Acrobat and ocr. The whole thing, it's like it's not going to work. And even more than that, like, the quality of the image would not work with most OCR engines, you know, so AI is really good at OCR of old documents. It's really good at handwriting. It's pretty good at translation, too. So I built. And we're not going to get into the building necessarily, but this is one of the few, like, XCODE builds I had to do. So this is a Swift build, a little Mac menu bar app. It's called OCR Party, which stems from the fact that we're just OCR ing part of the image. You got to have fun with these things. And let's see, we're going to open up that newspaper in OCR Party. We're going to get, like, a little preview window. So let's say actually what we want is Coolidge seeks peace in the world. So let's zoom in a little bit. Let's open up our cropping tool. This little thing down here is basically a choice between macOS vision and an AI API call. And the purpose of that is because sometimes people don't. Sometimes People don't trust AI, you might have heard. And so I built that in as an option, essentially. I would think the AI option gets used more. But nevertheless, now you're going to select just the part of this article you care about or this paper that you care about. And you can see there's like a crease in the paper. There's a weird black mark here, but you can imagine we submit this for OCR now we have just that text that we pulled. We're also calling out for our editors, like where on the page they're going to be able to find it if they want to sort of zoom in on it, crop to that particular article. And I can't exactly remember what text we were looking at, but it certainly completed those sentences where there was a black marker. Right. So AI was able to kind of infer to the best of our ability what that sentence might have said. And you know, if this ends up in a film, I could guarantee it would get fact checked later. But for the purposes of gathering documents, thousands of documents, this ability to kind of like, precisely, OCR has been a nice little unlock for us.
[34:57]
Claire Voe
One thing I also want to make sure people take away from this episode is we've seen basically three form factors of apps. So yes, they've all used AI, but you've been able to swap between sort of like a Python API service that gets called by another software application or database, a iOS app that, you know, you can run on your phone and then like a little desktop toolbar widget. And what I like, what I love about this moment in AI with, with regards to software engineering is like, if you have basic software engineering practices and then you know enough to be dangerous, like, yeah, you can, you can vibe code and you know a Swift, Swift app to run on, on your local desktop.
[35:40]
Tim McLear
Hyper specific app. Yeah, no one was going to make me this app. And so the ability to make like an extremely specific app that makes a workflow, you know, on my team and my company easier, it's been, it's been an unbelievable moment.
[35:54]
Claire Voe
Yeah. I would say the TAM for this app is like you.
[35:58]
Tim McLear
Yeah, yeah. I mean, I think I could sell it to like two colleagues.
[36:02]
Claire Voe
Well, and then my mom. So what I was going to tell you is my mom is a genealogist for the Daughters of the American Revolution, of which I am. One fun fact on Claire.
[36:12]
Tim McLear
Oh, no way.
[36:13]
Claire Voe
And she does the lineage tracing and do you know how many times she screenshots something and is like, can you read this cursive? Like what in the world is this name. And it's like, you know, one name and a big, a big image. And so I do think AIs. And I'm like, yeah, I'm gonna drop this in a chatgpt and I'll tell you what I think it says. And I think it's ability to read handwriting, old typefaces kind of understand the nuances of spelling and things like that are just really, really interesting for these sort of research use cases.
[36:45]
Tim McLear
Yeah, we didn't look at a handwritten doc here, but that is definitely something happening at our company. Like the ability to read letters that we could not read before and also just other languages. Right. And then we immediately have that text to. You have letters written in some kind of cursive scrawl from the 17th century that is now translated to English and made legible for you.
[37:05]
Claire Voe
Amazing. Well, we've seen three great use cases. I am sure you are the hero on the team for this kind of stuff because I can imagine again, people.
[37:16]
Tim McLear
Might be tired of hearing me talk about AI, but thank you.
[37:19]
Claire Voe
Yeah, but I mean, this is, this is hard stuff. It's tedious work to do. It requires a lot of time, a lot of detail orientation. And I'm sure people love using this information to produce amazing things, but probably is not their favorite thing like zooming in and squinting at the, at the text to try to get, try to get it the most accurate as possible.
[37:41]
Tim McLear
Trying to, you know, automate away painful processes. Right. Not the things people liked.
[37:47]
Claire Voe
Automate away toil. That's what we want to. Yes, that's what we want to do. Okay, well, we're going to do a couple lightning round questions. I'm going to get you out of here to, you know, go digitize a thousand more, more images. So the first thing I want to ask you about is just your approach to learning. It seems like from what I'm seeing, you're pretty fearless about new technologies, new things. I think this moment is such a critical moment for upskilling and learning. How do you think about learning in this moment?
[38:16]
Tim McLear
I think that one of the reasons that I find like tools like Cursor or Claude Code kind of intuitive is to me there's a parallel with creative software. So like at various moments in my career, I have been deep in Photoshop or deep in Adobe Premiere or Avid Media Composer, whatever it is. And those softwares are so complex, they are like a maze of tool menus. And you end up on Reddit and on YouTube doing your research, trying to just like figure out how to accomplish the thing. And I think that that's essentially what a lot of these tools are today too. Like I've been on cursor YouTube and cursor Reddit and learned tips and tricks on like from the vibe coding people of the Internet. And you know, I think it sort of starts from knowing what could be done or what's possible and the like path to get there is, is swifter than ever before.
[39:07]
Claire Voe
What I like about this, I started sort of my fascination with technology in these creative tools. I will like this is like pre Photoshop where I would go and how can I make my text look like liquid gold? And I would follow these like five step, you know, graphics tools, tutorials. And what I love about this moment in vibe coding or AI assisted engineering is coding feels such so much more creative than technical where these tools feel really like creation engines to me more than functional tools to write, write code. And so I love that parallel because it's what's made me so excited about technology my entire career. And I think it's why I'm so leaned in this moment like activates that same feeling of like, oh, now I can do, can make this thing that I didn't think I could make before.
[40:00]
Tim McLear
I think that there are a lot of people too in my industry who have a kind of creative brain and creative approach to these things that would, you know, maybe like looking at a cursor window right now when you have no idea what it is is a little scary. But I actually think that they are more well suited for the work than they might know.
[40:16]
Claire Voe
Well, let's talk a little bit about your industry because I know that the film and creative world is deeply skeptical of AI. Sometimes we, we, we, we wade into the, the waters of AI video generation on this podcast and get a little feedback. And I totally understand. I have family that's in the creative industry. I'm curious, you know, what's your point of view of AI, particularly in the film world? What are you excited about and where do kind of concerns are really warranted? And then where do you think the most practical applications are?
[40:46]
Tim McLear
I think today it's like sort of where we started at the top. The practical applications are more in like tooling than they are in creation. But I do think that like the creation's gonna get there. Like today I play with, I play with all the generative video models. Like how can I not? They're, they're super fun. They are not like at professional grade quality yet. Like the amount of time you spend throwing tokens at even the highest end, video models. You're not going to be able to match your shots that well. You're not going to be able to match the footage you shot yourself that well. And so I don't think they're there yet, but, like, I'll be honest, they're going to get there. I think that, like, they are still exciting to me. But I would separate a couple of things. Like, in the nonfiction world, I think. I think people should be careful. Like, I think we should not be generating archival footage. We should not be trying to fool our viewers into thinking that there was video in 1750, you know, and I think that that's the part that's, like, a little scary. And then, of course, there's the display, like, job displacement aspect of things. I think people are scared. If you film stuff for a living, you're definitely scared that, like, that you're going to be able to just, like, use text to generate that same video you used to shoot. So I don't know how to, like. I don't think anybody has, like, good answers to that part of it. But my approach has certainly just been like, jump in and learn the tools like they are. They are going to be here whether we want them to be or not. And I think that they have a lot of practical benefits today that are less scary. Yeah.
[42:17]
Claire Voe
The best advice I can give to people and I have. I have. Of all the spaces, and I'll say this honestly, of all the spaces, I have the most job displacement concern. It's in video generation for non. Non archival, non documentary. Is this. With commercial use cases, you just. You just see how it could be very applicable. And the best advice that I can give to people in this moment is the more you learn the tools, the better off you will be. Whether or not, you know, whether or not you love where the tools are taking us as an industry or as a culture. Knowledge is power. And so the more you learn and understand, 1, you can identify opportunities where it does add value, even in your creative process. And two, you're going to be differentiated in the market from a job perspective because you're going to have a more robust sense of what's available in your industry. And I think that stands for people in your industry. I think it stands for people in my industry and technology. So I just say there is no harm in learning this stuff.
[43:19]
Tim McLear
Yeah, absolutely. I also think that, like, there's a place in the process for it which allows you, like, a place to learn without thinking it needs to end up in the final product. Right. Like, you can use video models for storyboarding all day. You can maybe prove whether or not that shoot is worth spending that money on. Now you've learned how to use the video models a little bit, and, you know, you haven't necessarily displaced anyone, but you've, like, made your production a little bit more efficient, a little smarter. Maybe you've shot better footage as a result of it, you know?
[43:45]
Claire Voe
Yes, but we're not. We're not generating fake archival footage of, like, gang.
[43:49]
Tim McLear
We're not. We are not doing that. Definitely not doing that. And unlike pbs, which is where most of our films end up, have a lot of guidelines around that, and I think that's a good thing. But it's the other stuff. It's commercial, it's visual effects. Like, a lot of stuff's gonna get easier. And so it's coming one way or another.
[44:05]
Claire Voe
Great. Well, last question. Have to ask you. When you know you're on your dog walk with ChatGPT, doing voice mode, and it's not listening to you or not giving you what you want, what is your personal prompting technique? Especially because you use voice. Like, I'm willing to type things to. I don't know if I'd be willing to say them. So what's. What's your technique here?
[44:27]
Tim McLear
It definitely is different when you have to say it out loud. I am. I am super nice to the AI. I, like, can vividly remember the one time I was mean to it. I'm nice to the. I don't know where this is going. I'm going to be nice to all the models. What I do is, like, for lack of a better way of describing it, I just start over. Like, I will. I know that a lot of these things have ways of, like, consolidating the context window now and sort of summarizing. But I will ask for what I call, like, a resume work prompt. I'll be like, this isn't working. I want to resume work later with another AI dev. Can you give me a prompt with everything they'll need to know? And typically, what you'll find is that that prompt shows you where it was off, you know, like, in its summarization of what it was doing. I'll be like, oh, see, like, I wasn't asking for that. That's why we were not communicating. And then I'll take that resume work prompt, I'll prune it a little bit, pop it into another chat, and then, you know, you'll find that you wish you hadn't beat your head against the wall with the previous chat for 20 minutes.
[45:24]
Claire Voe
You know, I am also team be polite to your AI. But then again, like you hurt the one you love the most. And I've. I've found myself occasionally getting testy and you know when I stopped being mean to AI is when reasoning really started to show. And I could see it, reasoning how upset I was.
[45:40]
Tim McLear
It was, oh, it'll be like the user is mad at me right now.
[45:42]
Claire Voe
The user is really frustrated with me right now. I need to totally rethink my go sweet, sweet baby AI. I'm sorry, I apologize. I'm not that mad at you. Okay, so create a, you know, go return to progress prompt. Really get the summary. Take that to understand if there was some misunderstanding. Improve that and then just start fresh. That's great. Well Tim, this has been super fun. So much for me to learn. I have tons of ideas, even just for my day to day life about how I could use. I have kids So I probably have 30 000.
[46:13]
Tim McLear
Let me know if your mom wants the OCR party.
[46:16]
Claire Voe
I will. She'll love it. Okay, mom. I have gotten you your first vibe coated app direct from the podcast source. Tim, where can we find you and how can we be helpful?
[46:25]
Tim McLear
Yeah, I'm not that active on social to be honest, but I am on. I'm LinkedIn. You can find me on there. I have a website that is itself a fun vibe code project. So you find me timmacular.com I have a little chatbot there, the GP. Tim, you can go chat with him, learn a little bit more. More about me and my work and then other than that, I would say tune in to Florentine Films upcoming production. We have a series about the American Revolution coming out in November so on your local PBS station.
[46:53]
Claire Voe
My kids are obsessed with the American Revolution so everybody sounds like it's in the family. Yeah, we will be big fans. Tim, this has been great. Thank you so much and thanks for joining how I AI, thank you for having me. Thanks so much for watching. If you enjoyed this show, please like and subscribe here on YouTube or even better, leave us a comment with your thoughts. You can also find this podcast on Apple Podcasts, Spotify or your favorite podcast app. Please consider leaving us a rating and review which will help others find the show. You can see all our episodes and learn more about the show@howiaipod.com See you next time.