Loading summary
A
Welcome to season two of Derms on Drugs, a video podcast brought to you by Scholars in Medicine, the best educational platform in dermatology and provided a no cost to medical providers. Derms on Drugs is where cutting edge term meets hit or miss comedy. I'm Matt Zyrus from Doc's Dermatology and each week I'm joined by my residency buddies, Dr. Laura Farish from the University of North Carolina, Dr. Tim Patton from the University of Pittsburgh, and we use our 60 years of combined derm experience to discuss the debate and dissect the hottest topics in dermatology. It is everything you need to know to be on the cutting edge of Derb and you'll actually have some fun listening. New episodes drop every Friday on Scholars in Medicine, Apple Podcasts, Spotify and other major podcast platforms. And there is a video component that has the key figures and tables from the articles we talk about. So we are so excited this week to be running a deep dive into one of the hottest topics out there, artificial intelligence in dermatology and in medicine in general. And. And we have the perfect guest to get into it with. Dr. Farah Kmanger. I'm terrible at pronouncing last names from Silicone Valley, who is the owner of the company that runs dermgpt, which we are actually going to be talking about a little bit today. There's some interesting literature out there that actually, from what I could tell, has no commercial connection with DermGPT. So I'm excited to see where we go. Dr. Kamanger, great to have you here.
B
Thank you. Thanks for having me. Very excited.
A
And let's just go ahead and get into it. So why don't we start with Dr. Patton? Patton, what do you got?
C
My deep dive paper, May 2025 edition of Advances in Dermatology in Academic Allergology. Allergology. I think that's how that's pronounced.
A
That sounds right.
C
Sure. It's tight. That's why I didn't go into allergology.
D
Couldn't.
C
Yeah. It's titled Comparing Physician and artificial intelligence. Chat GPT 4 Responses to Common Patient Questions Regarding Hydride 90 Supper A single blind study by Lewandowski at a study was performed in 2023 at the University of Gansk in Poland. 20 questions about HS were answered by Chat GPT4 and also the dermatologist. That's air quotes for the listeners. Maybe Poland only has one dermatologist in it. They. They didn't really go into details who answered those questions, but that's what they said. They were labeled answer one, answer two. And respondents, which was 30 HS patients and 31 physicians, were asked to rate. The answer is using a Likert scale for quality, empathy and satisfaction. So chat GPT4 kind of crushed the dermatologist. Patients rated the quality, empathy and satisfaction of Chat GPT. GPT4's responses higher than that of the dermatologist. Physicians also rated ChatGPT's answers as better. Figures 2 and 3 lay out the data pretty nicely. Robots are better than live dermatologists. Let's just face the facts. 88% of the time patients said, I like the chat GPT4 response better. So we'll out, I'll be out of jobs pretty soon. And I for one, welcome our new chat bot overlords. But wait, when respondents went through ranking all of the answers, they were asked the question. So at the very end of everything, they were asked the question. Given that AI can answer your questions more accurate, accurately and empathetically than a doctor, who would you rather receive an answer from, the doctor or the AI? It's almost like the authors knew that the AI would be better. That's, it's almost like they came up with that question ahead of time. Anyhow, 70% of the patients and 80% of the physicians said they'd prefer to get HS answers from the dermatologist. So for now, I guess our jobs are secure. It's just such a weird like why, what's any explanation? I mean, yes, you like that human interaction, but I mean as time goes on, is that going to become less and less important? It's, it's pretty interesting.
A
And this is from 2023, right? Yeah, so, right. Chat GPT has gotten a hell of a lot better and the dermatologists have not gotten any better.
C
Like, right.
A
It's, it's a really interesting that anti Ainis like I, I did Tesla for a while, got rid of it now because I was not a good driver. But even though my family knows I'm a terrible driver, like get into accidents all the time, they still preferred me driving to me letting the Tesla drive. And I, I, I couldn't figure it out. I was like, what you, I'm a terrible driver. This is better than me. And they still wanted it to drive instead of me.
D
The only thing I could black box thing. Like we don't understand it. And I think that that's part of probably a lot of what it is. But I also think it's going to get better with time. Like, do you remember when Uber came out? They're like, here's this great thing. You're going to go on your phone and you're going to tell a stranger where you are and they're going to come put you in their car and drive you places. I was like, that will never take off. I would never do that. Now I'm like, I like that way better than a taxi, right? Like it way better. I just think we're. It's taking time. I think it's like a little bit of it's going to take time to. To get people around to it.
A
Farah, your audience is so right. This is kind of talking about the patients as the customers. Your customers with DermGPT are dermatologists. What has been your experience with this? Of do you get people who call you like, I can't believe you're doing this. You're gonna work with this? And other people are like, oh, this is like, what's your experience?
B
Yeah, it's really interesting. So I think what we're talking about is the technology adoption cycle. Like that classic. You have your super early, early adopters that are into things like, yay, the Uber or the Waymo, let's do it. And then you have kind of the stragglers and then you have the really late people where they're just like, okay, this is just the norm and I'll fine, I'll take an Uber if I have to. I think the cool thing we saw with Dermgpt was we launched in 2023 and I think within a few months we had over like 13,000 docs that had come to the site. And right now we have about a good, a little over 4,000 super users. So I think there's a difference there. There is the technology in General. Like in 2013, 2016, all the electronic health records came out. Everyone was like, oh my God. Like, I remember I was at UCSF when EPIC rolled out and some doctors retired. They were like, that's it. I'm good here. I'm leaving with technology at this stage. But AI has been different. I think generative AI has been taken up a lot more easily. Even just OpenAI ChatGPT itself, a bunch of millions of people were on it, like, instantaneously. It was the highest adoption of a technology. And I think part of it is it's intuitive to deal with it. It's language based and language is how we communicate. It's like when you talk to someone and it's talking to you in a similar language, you're able to just connect with them more easily and Then these large language models, the reason they're beating out the doctors, they're created to just be super nice, like super customer servicey. They're created for you to want to keep engaging with them. And you know, as we're busy and we're, we have lots of things to do, so our communication maybe has, you know, the quality's dropped a little as far as how we communicate. So I'm not at all surprised that in like an information or communication task it would beat us out in like niceness or you know, pleasantness and all of that. But I think we just have to realize what AI can do and what it can't do. And there's still a lot that it can't do. So we still have jobs for a little bit longer at least.
A
I think we're.
B
You think we're slowly out.
C
Yeah. Matt has us going the way of extinction, dinosaurs, all that.
B
I think we're still okay for just a little while though. Yeah.
C
What do you think explanation would be for again, after going through this and knowing that they preferred the AI response, what do you think there is about saying I'd prefer to talk to a human?
B
I think that's still that part of the early adoption cycle, like over time as AI is there and people know the answers are good and it's doing a good job more and more so that preferring to talk to a human thing will probably fade a little bit once we have the accuracy and the trust. But I think the good thing is the models are not as good as they seem. It depends on the task at hand. For example, like the communication piece, it's set up that way so it's going to do better than us. But there's a lot of hallucinations that still happen. Like a lot of the work we do with DERMGPT is how do you actually make it so that it's correct a lot of the time? Because good enough isn't good enough in medicine. Right. It needs to be correct. It can't anchor bias you to the point where it gives you something you're making.
A
So as I follow the autonomous driving thing closely, like I'm not like a huge Tesla investor, but I literally got one because I wanted to see like how well the full self driving works. And so I invested fair amounts but. And the biggest problem is that people hold self driving up to a standard of perfection, whereas the right standard is 1% better than humans and the same thing. So like the one AI that I sent out today, I had a few non medical friends like, hey, you had that problem recently, go try this thing and see how well it works. And they did it and they, you know, sent me the results or whatever and they're like, yeah, so it didn't get it, you know, right. And AI sucks and it's terrible. And I was like, but your doctor took a year to it had your thing second on the differential. It took your doctor a year to even mention or think about the thing that you like. It got immediately. And they're like, yeah, but it didn't get it right. And you can't trust AI and computers are terrible. And I think maybe they are, but doctors are worse.
B
It's true. It's comparative. Right? Because like for the. So DERMTBT is large language base. But a lot of the derm of what we're trying to do is diagnose melanoma and image based. So they put out the sensitivity and specificity for this device to like diagnose melanoma. I thought about that. I was like, what's my sensitivity and specificity? Like how many moles have I biopsy before I got to a melanoma? It's true. And actually I'm like nerdy enough. I calculated that ones. It's still better than the models that are out there. But I'm biopsying a bunch of moles too. Like we're not like 100% the melanoma walks in within five feet. We're like, that's, that's a melanoma.
D
And you don't always know the melanomas you miss. Right. So that's what's so hard. It's hard to do sensitivity for humans. Yeah, it's right for melanoma detection and.
A
The new imaging stuff that's coming out. So we, we did an episode on this several months ago and there's like this new device that's like picture, picture, picture and it's doing cross polarized dermoscopy on every single mole on your body with like a, you know, 10. I don't know if it takes a minute to take the pictures. It's nuts. Like that's going to be orders of magnitude better than us. Very quickly. Right? Very quickly.
B
That's true.
A
I do actually think derms are safe because we're so procedurally based. I think that, what I think is likely to happen is that medical derm, routine medical derm gets quicker and faster so that the, we're able to meet the demand more effectively and more of our time shifts to doing procedures. So we're all Going to make more until the robots get good enough. Once the robots get good enough, just do the biopsies and the surgery. I figure that's. I think that's 10 years out before that happens.
D
I think the first cataract surgery done completely by a robot has it was recently done. I just know this because I'm interviewing chair of ophthalmology candidates. So you're always like, what's the next new thing in your field? Then like three people were like, we just had the first cataract surgery completely performed by a robot. And so I mean, cataract surgery seems harder than the skin biopsy to me.
B
Right. We can do that.
D
Right?
A
Wow, that's not good.
B
And you know the task actually, Matt, that you talked about with the polarized photography, the task that it will beat us at 100% is that longitudinal follow up of that mole. Because if that person comes back in six months and it takes another photo of it, there's no way I'm going to remember what that mole look like. And even our photography currently is not that good. To the point where if I put my two pictures next to each other. So there are certain things that's going to blow us out of the water.
A
Single mole on their body, Every single mole in their body. It's going to be comparative Digital dermoscopy.
D
There is like such a huge issue with over diagnosis of melanoma and of thin melanomas. Right. And so there are things that we biopsy and then they come back as early evolving melanoma and situ. But you know, what is like the definition of cancer? It's uncontrolled growth of cells. If a lesion on your skin is absolutely not changing, it kind of doesn't matter what it says histologically. So I actually think it's going to get us to the right answer. And how many of us have had not like, oh, that's an early melanoma. I almost didn't biopsy. I mean, we've all had things that are like, geez, that's like a 0.9 millimeter melanoma. And that did not look like it fit the abcds. And I could have missed that, but that would have been newer changing on photos. Right. So yeah, that is to me what makes so much sense. Sense like the fact that we as highly educated people look at every inch of skin on people's body over and over again all day long as a cancer detection tool is crazy.
A
Y.
D
Right?
B
That's a good point. Yeah, it's like just the merits of what we do daily, that's probably most of what I do. I do. You know, we all do everything, but that ends up being most of what we do. Right?
D
That's a lot of what we do. It's probably greater than 50% of our workforce efforts. So if we could then get the things that are really truly the things we need to see, we could just do so much more for more people, 100%.
B
And to your point, it's. I think what I will do is take out this kind of mundane level so we can do the more complex things. And the question is, you know, what are those next complex things that I won't be able to do as easily?
A
I think AI is going to do the complex stuff too. I think we're all going to be sitting at home with our humanoid robots taking care of us and maybe doing podcasts so that we have something to do. Who knows? All right, I'm going to jump on to my article, which was Artificial Intelligence Physician Avatars for Patient Education, a pilot study. So essentially what they did, this was plastic surgery. It was done out of Mayo. They basically took a couple pictures of the doctor, took a short voice sample, had AI make a deep fake of the doctor, then had the doctor, like, type out his in responses instructions for several, like, for 10 common questions. And then for post op, they had patients be like, okay, like we're doing this experiment with AI, you're going to sit down and it's, you know, they was done in a room in the clinic and the whole thing and it was just like a telemedicine visit where the avatar was on the screen, the patient was sitting here, they asked questions. The AI, you know, picked which of the scripts to do based on their question. But the AI completely generated all of the voice, the inflection, the facial expressions, all of that stuff. All the doctor did was type it out once. And so this was not looking at content, it was looking at patient acceptance of this. And the main takeaway was that patients completely accepted it. So it had a. They did this radar chart which, like, kind of as a way of looking at different aspects of things. Usability was close to 90%. Engagement was 85%. Acceptability and trust was 90%. Realism was about 80%. And eeriness, which is, you know, the equivalent of creepiness, like, was this weird, like did was close to zero. So on a one to five scale, the mean eeriness or weird scale was 1.57 out of five. And so the takeaway from this was patients Were totally accepting of it. And I'm at this point pretty certain that. Right. Because they could now create a digital twin of all four of us. And pretty quickly the LLMs will be able to tell the digital twin what to say and patients won't be able to tell it's not us. Now, ethically, we'll still have to say it's them. But imagine if patients, instead of, you know, calling your office, hey, my blah, blah, blah, my, I'm red and peely and itchy after that cream you gave me for my acne. If they could do a video visit with the virtual you anytime they wanted. Right. 24 7. You're going to spend as long with them on the phone as you. They can do a two hour virtual visit with you. Well, what about this pit one? What about that one? And, and you'll be patient and kind and the whole time and you'll never get annoyed with them and everything else. Like, it's. We, we can't compete. We cannot compete even if it's close to us in terms of like, knowledge and content. When people are like, oh, but it can't do empathy and it can't do the human touch. And people are, it is better than us at that stuff. That is what it is particularly good at is a video avatar. And it like.
D
It does not fatigue. It never be you on your best hair day.
A
Yep.
D
Right. It can be you and your most flattering shirt color.
A
I mean, it's never like, oh, I've been in this room for eight minutes. I'm three patients behind. I need to get moving. You know, one visit. You're going to need to reschedule for that. No, it will spend three hours talking to them if they want. Like people. Well, a lot of, it's. A lot of visits is just psych. It's not real medicine. It's not just. I know. That's why they like it more than you. It will listen to them for days if they want.
D
So, Matt, I love how you go from to like the most extreme use case immediately with this, because this is, this is you. I like it. But here's my thought. Is there like, maybe a more intermediate. Like, maybe this would be great for counseling. Like, maybe this is me counseling on what a skin biopsy is. They could have me doing this and I could, you know, update it for anything. An excision, cryotherapy. Right. I mean, I have thought, like, if I call, I mean, I left something at a, I thought at a restaurant, you know, the other day I called there to ask like, hey, did you find my, did you find any, you know, AirPods? It was AI that answered. It's like, hi, what are you calling about? And you know, I quit pretty quickly picked it up that that was what it was. But like it asks, oh, did you lost something? What did? Like it can ask that and then you know, provide that. Like if I. A lot of restaurants, when you call to order food, it's AI taking your order. Right. Like certainly scheduling visits. There is nothing magical about the people who we paid to schedule visits. Right. And but the magical thing is that they can only work between like 8 and 4:30 and then we can't schedule appointments at 9 o' clock at night, which is when you think like, geez, I probably should go to the dermatologist. So like maybe even the more mundane things when it is much more human is where it could be sooner.
A
Yeah. So my, my private equity group is implementing this now. Like we, we've got AI agents that answer the phone and oh, you're calling about. You have a question with this or you want to schedule just exactly what you said for the restaurant. Like we're, we're, we're doing that and we're have physicians beta testing the Emma AI scribe thing and we're going to be rolling that out in the next year, probably the next couple of months. I mean it's, it's nuts. It's nuts.
D
Yeah. But this, I mean a lot of us are doing AI scribing. This is next level, which is giving you what feels like a very human interaction and in fact with a human that you know. Right.
A
Way back when I was at OSU, I made about 20 videos of me counseling.
D
And I remember this.
A
Yeah, it was like cocomedopropyl, betaine, lanolin, like this, that and the other. And I used to experiment with it and like, okay, today I'm going to use the videos and the next day I'm not going to use the videos. I'm going to actually, it's literally the exact same content. It's just me delivering it face to face versus a video of me delivering it. The video of me delivering it at least an order of magnitude better. So the patients retained the information better, they acted on it more effectively. It was just nuts. Like I could say it and they'd be like, well what about. And I don't think. And I would have the video say it and be okay, absolutely on board. It was nuts. It was nuts. And I think that maybe that's why I'm so like this is going to push put us like it's better.
D
I mean like, I think patch test counseling would be a great use for this. Right? It could be you. It could give all the information. It could run their. Whatever those things you guys do. That list, that camp. Camp list.
B
Yeah.
D
Right. And it could say I've gone through, you know, this is what the camp list is. And they could be like, but do you think it's my shampoo? And they could be like, no. Do you think it's my fabric softener? No, it turns out cocometal propobectane is not in fab. Like you should have conversation the way.
A
It will be soon. And camp is putting this in is like you can scan the barcode of your shampoo and it will tell you that shampoo is okay. That shampoo is not okay. You can take it into the store with you. Is this shampoo okay? It scans the barcode and like. Yep, it's okay. No, it's not okay. Like it's. Yeah, it's the. The American Tech Derm Society is, is doing a really good job with, with the next version of camp. Yes.
B
Okay.
A
All right, Patton, let's. Let's move on. What do you. What do you got? No, Ferris, you're up.
D
No, you're forgetting me. Okay, so I have a value. A research letter in JMIR Dermatology which is evaluating artificial intelligence models and dermatology. A comparative analysis by Patel et al out of UC Irvine. So this was a head to head chat GPT versus dermgpt.
A
So wait, I'm sorry for interrupting. Farrah, did you guys fund this? Have anything to do with it? Is this a conflict of interesty thing? Like do you, do we all just.
D
Happen to be in Irvine, California? Yeah.
B
So you know, we did have. I think some of our med students were involved in this. So not a funded thing, but definitely like a academically related okay.
A
But not funded because we. Patton has covered on the show that when it comes to supplements, if the company funds the study, their meta analysis, it works. And if the company didn't fund the study, it didn't work. So it's okay.
B
That's so true, isn't it? We did not fund it.
A
Okay, all right, all right. Fair. Go ahead.
D
Okay, so survey. Basically a survey based comparative study where it was dermatologists rated the answer but that Dermgpt gave versus what Chatgpt 4.0 gave. And they were common derm questions. They were written by two Durham residents and faculty and trainee and then the faculty. So the Durham Residents wrote the questions. The faculty and trainees judged which one was better. So how was it set up? So three of each, interestingly. And I was going to ask you about this, Sarah. So three of the questions were actually dropped because DermGBT just said, See your dermatologist for guidance. So I thought that was kind of interesting. Dermgpt. Maybe you guys have safeguards in there that if it really does not have an answer, it's less likely to hallucinate. It is more likely to just send you to your doctor. And then they were. What's that?
A
And let them hallucinate.
D
And let them. Yeah, let us hallucinate. Exactly. So then they like they were blinded as is it model A or model B? And then. And then basically attendings and trainees at UC Irvine and UC Davis were invited to like do a survey and either say model A was better or model B. So here. What, what about sample sizes? So 64 dermatology faculty, 30 residents. There were. Yeah, the respondents were those. Sorry, that was who was sample. That was the sampling frame. And then the respondents were 19 people total, 13 attendings and 6 residents. Actually 19 people who did. Who actually agreed to do it. And so you had basically 258 answers. If you take every person, if you take all of the different answers in every single combination of random rating, basically. So it's many ratings from A. From 19 people total though. Okay, so what did they find? DERM GPT answers were prefer, were preferred more. So DERM GPT answers were preferred 48% of the time versus 28% of the time it was chat GPT. The rest were, were either like equal or both inadequate. This was statistically significant among attendings. Durham GBT was favored. And this was all in kind of to a similar degree for the residents as well. Now they also asked for, you know, so basically there were more like tight to the point, better answers out of Durham GPT. They also each one, each model LLM gave references and it turned out that when they asked the raiders which references were better, ChatGPT's references were preferred more frequently than DermGPT. So the references that it gave were More preferred for ChatGPT, but the answers given were more preferred when they came from dermgpt.
A
That sounds to me like they were trying to make it look like a balanced study. We got to find something good to say about ChatGPT, otherwise people are going to think it's crap.
B
Although I will say I believe that because the user interface on the larger foundation models of how the references show up it does beat us 100% that. So that's probably true because they, they have it really, it's really pretty. Like they have the logo of the journal and then they, they bring up the actual. You know, for us it's like a link to the PDF. So I'm not surprised that our reference is lost. We should put some effort into making them look pretty to.
D
Making them look. Yeah, more. And I guess they also said that there are like ChatGPT was more likely to pull from like higher impact journals like JAT or jama. That was what the paper said. So. Yeah. Kind of, Kind of interesting.
B
That's super interesting. Yeah.
A
I looked for a while and I couldn't figure out. So you can round it off to that 50% of the time people preferred DERM GPT. 25% of the time they prefer ChatGPT. And 25% of the time it was either equal or both answers sucked. And I could not find anywhere where it said was that other 25%. That 25% of the time both answers sucked. Or 25% of the time both answers were equally good. It just said 25% of the time neither answer was preferred.
D
Yeah, I don't know that it, I think it just said, do you prefer this or not? The other. I don't know that it gathered that level of information.
A
Nope. It gave the four possible answers were prefer A, prefer B, prefer equal or neither are both inadequate. I think was the exact terminology. Because I like, I like it. Really. That matters a lot, right? If 25% of the time, if 24% of the time it was both answers suck. Like that matters a lot. Right. Versus if it's 25, 24% of the time both answers were equally good and 1% of the time both answers were inadequate. Like that. That was the most interesting thing in the whole thing to me.
D
And so yeah, they only gave the answer as other.
A
Yeah.
D
Which was not this was better or that was better. Not like it was. It could either be chat was better, DERMGPT was better or other. They did not actually break it down that I can tell. Unless it's somewhere in a, a supplement that I'm not aware of, but I'm not even seeing it referenced in that.
A
I went through the supplement looking.
D
You did okay. You went through either more. I could not find that. Yeah, because it don't. It did not break it down to that level. So.
A
All right, so Derm GPT beats Chat GPT now.
B
All right, we beat it in this one task. In this one task. Of dermatology Question.
A
So how do things like you guys work? So I kind of have assumed that it is like a, the branded. Like with Verizon, they've got Verizon and then they've got some other one where it's 25 bucks a month. And so it's still Verizon, but it's like just different branding. So I've always assumed that like Derm GPT is just ChatGPT or Gemini or whatever with a different label. Like how do you have. If you were to make your own thing, how does that work?
B
Yeah, no, I think that, that, that's a great question because the, these tools are actually very different and I think it's important for people to know how to use them because they all kind of go into this bucket of AI, but it just depends on how the model is trained to work. So for example, for dermgpt, it's what we call like a retrieval augmented generation model. So you take a general foundation model, like a GPT actually really good, like the G GPT Chat GPT, Claude, these models are excellent. So you don't necessarily always even need something specific. You could just go to these. And I often do like, I love Claude, I use it all the time. But the foundation models are so good. But what happens is sometimes they're like too good. They're just making up stuff. So I think that's the whole problem we're trying to fix for medicine is how do you just really reduce the hallucinations? And Laura, like you called it, if you don't have an answer, don't give an answer, versus the big models are set up to no, just say something. Sound good, make the person happy. So keep engaging, keep them on the line.
A
Right?
B
Keep them on the line.
A
You might be too young for this, but you know, when Pat and Ferris and I were kids, there were the, was it 1900 or the 976 or the whatever. It was like 399aminute and they were like porn lines. I never called any, but it was, it was like they just, if they.
B
Keep them, keep the minutes going, 100% line, right? That's what, that's how it's an old business model, but exactly, exactly as business. These big foundational models are trying to keep you on the line. That's 100% it for us. Our benchmarks is please don't get it wrong. Please don't get it wrong. I'd rather you say, not say something than my colleague and like another state gets an answer and they obviously know it's wrong. That's more embarrassing to us than the drop, the call being dropped sooner. So. But what we do differently is when the ChatGPT APIs came out, we were like, I think one of the first people to just get the API and start building Durham GPT. For me, I've been working in health tech for a really long time, about like gosh, like 20 years now or so. There was always these problems I just couldn't fix that were very language based. As soon as I saw the generative GPT models I was like, this is it, this is how we can solve in basket times, pajama times, people notes, like all the things, scheduling, like it was just, I was like, this is it, this is how you do it. But then you have to make it good. And there are these models that are like, we're going to save the world. It's like, no, not yet. You know, we're not that good because there's still limitations to what AI can do, but you can make it pretty helpful. So basically what we did initially, just like other basically LLMs for medical, we got super excited and attached every article we could from PubMed into our LLM. Like everybody did that at first. We built these APIs, we're like, get every single article you humanly can and this will create the best medical LLM. And then we kind of found out that that actually produces garbage. So we had like over 70,000 articles. We had all the articles in the world in there, which is what chat GPT has access to. It could pull anything from anywhere. But then we decided that's, that's actually not good. So we start to curate more and more and more and the curation processes myself or another colleague looks at and goes, no, I don't want that, or I do want that. It's simple to a dermatologist, but impossible to a non dermatologist. This kind of curation process, you just look at a journal, you know you're going to use that or you don't or you know, so we got rid of a lot. So that was the main thing is curating this derm brain. And then we kind of taught it differently because that's not how we train to be dermatologists. You don't get their residency day one and they're like, here's all the PubMed articles, do your thing, become a derm. Right? We're, we're taught very systematically. So we built our model in that same sense. It really just has a derm brain, has an infrastructure and then the articles are really on top of it. It's a supportive element. It's not purely the articles that it goes from. So it's, it's kind of a interesting how you approach how to solve that problem. On top of that we have multi level agents now, which is also different from ChatGPT or a lot of the other models. So we'll have like two or three layers of agents that check each other because the LLMs are better at proofreading than providing an answer. You could do this yourself. You could get an answer on ChatGPT, go over to cloud and say proofread this, come back From Cloud to ChatGPT, say proofread this, do that two or three times and your eventual answer might even end up being something close to like what Derm GPT can do. Not to minimize what we do, but basically we have multiple things set in place. We have cases in place. We have this agent called Derm Guardian that just goes through everything you're doing and make sure you're not like doing something that's going to get you sued. It looks for like high risk cases not to miss it, like looks for stuff you should put in your notes. So we have all these kind of extra features to it. So this is a cool study. It doesn't surprise me that the final answer hopefully was better. Sounds like maybe it was. Also doesn't surprise me that our article kind of look and feel and possibly they might have had some higher level articles but that article might not be where the right answer comes in for this question. Whereas like a Chat GPT is built to be impressive. Like, like there are these other LLMs that they say we bought like the biggest journal and that's why we're good and they're just using name association. But that doesn't, in medicine an answer is either good or it isn't. The branding doesn't really matter to us like if they had a fancy article or not. In the end you judge the question.
A
I'm not afraid to.
B
But that's, that's a long answer to what?
D
To what answer in a low impact journal may be a more helpful answer than tangential answer in a high impact journal. And we're like quoted Jamaderm and jad, oh, we're good, you know, but maybe that was in like American Journal of Clinical Dermatology or something.
B
But it's the right, totally it's the right answer. And I think that's what these big LLMs are going after. They're like, well if we just put every logo of every big, you know, journal on our site, then that's how you build credibility, which it is to the viewer. But really at the end of the day in Clinic at 5pm you just want the right answer. So you know, you just need to build the back end to do the job you need.
D
So does you know DERMGPT are do you see this as and it may be more than like our goal is to have information that dermatologists can use to provide better care or patients can use as a resource to find more answers or to help with the mundane. Like you started talking about like this is where I'm going to go to write prior auth letters. Or how like where do you see your niche really being with dermgpt?
B
Yeah, it's the derms. It's the derms. And I was a department chair for five years and I really saw like I used to follow what our docs were doing. Many of them were on the EMR at like 10pm prior auths in basket messages. This is that population, this is who it's for. It's the attending physicians. High volumes. We know we're less than 2% of the house of medicine but we're many times the front first entry point of any healthcare system because everyone has skin and everyone has skin skin problems. So our volumes are crazy. So the tools that are built for like let's say primary care might see 10 or 15 patients in a day. Tool that's made for that volume is not going to be what we need when we're seeing 30 plus sometimes patients. Most of our medicines need prior authorizations. There's about five subspecialties like U.S. that are hit hard. So like in 2012 our burnout numbers were almost like non existent. We were super happy. Now they're pretty high. So but that's because we're in that kind of small subset where if you're doing a couple of things inefficiently, you do that times 30. That's, that's, that's a lot versus maybe a primary care doc might have, you know, 10, 10 of those clicks or something and a response. So it's really that small niche. Like we don't solve every problem for everything in the world, but we really want to solve that problem for that dermatologist. The board certified derm, mostly medical derm but I think some surgical derms are using us as well. But the people seeing 25 plus, 35 plus patients, prior auths, that's the kind of group that's like drowning that we're hopefully trying to help a little bit with this.
D
So for those of us drowning, can you give us a vision into five years from now? What could our day really look like? With where the technology is and where you realistically think it'll be in five years, what would your day look like?
B
I would say it's even here today. So we know the data shows for eight hours patient facing time, we are often spending four to five hours on non patient activities. And that seems to be pretty consistent for the derm group as well. So those are in basket messages, prior auths, notes and themselves. And they're all interconnected because if your note is not complete and doesn't have the prior auth things in there, then your prior auth fails, then you go into denial land. So all these things, they're not separate. Separate actually. And then the AI can pull information from your note to answer the patient question. So they all kind of actually live in an ecosystem. But we've done many trials now where we've seen like a three hour session of in basket message kind of like response can be reduced to 30 minutes when AI is intervened. It's not like you're not in the room, you're sitting somewhere else having coffee, you're still involved with it, but it just decreases that cognitive burden by, you know, exponentially, basically decreases it. So it already exists. That's why I always just try to tell all our colleagues, even if just on chat, GPT or cloud and not necessarily dermgpt, even those models are like you mentioned, we're not perfect either. So even those models are sometimes a little better than what we could do. So just get on any generative AI and get these mundane tasks to like, even if you're like, oh, I'm just going to do it in two, three minutes, make that 30 seconds. And so if you keep doing that repetitively, you're going to win back a lot of hours.
A
But don't they have to get directly integrated into our EMRs and be proactive rather than reactive? Because that's ideally. Yeah, that's where the big difference is going to happen.
B
I think 100% ideally, that's, that's the world where you don't have to go from one thing to another site and, you know, type the information again. And it's coming like a lot of these. And EPIC is bringing a lot of AI tools, so it's definitely coming. The one thing though is same thing that happened with electronic health records. These processes come in for primary Care. And so they're not always. And it does matter. It doesn't matter how the prom. The models are thinking and how they're prompting. They need to understand our language. Like, even our scribe systems are sometimes difficult because our physical exam is difficult. The primary care doc just has to say, like, positives, because everybody has a heart and lungs and, you know, all these things, they just have to say, was there something abnormal there? Was there something abnormal there? Our exam's different, right? You're not supposed to have a mole here, but you do. So that's like, you know, that's. It's a different way of looking at our physical exam even. And then our words are harder. It's not just lungs, it's, you know, erythematous, blah, blah, blah, you know, so it's. Everything is a little bit different. So the tools are coming, but it's going to take a while before they're actually meaningful for subspecialties, which is what I found over time.
A
So my pathway of using AI in medicine, basically last year, I thought the large language models got to be good enough that I started to, like, use them for some stuff. Up till then, they weren't usable. And then midway through this year, I discovered Open Evidence and thought it was amazing and then decided it was terrible because there were several obvious things where, like, it gave me the wrong dosing for. And I was like, no, I knew basically the dosing, but I was like, no, what you just told me was wrong. That's not about what the package insert says. And then Open Evidence said, no, this is the dosing and the blah, blah, blah. And so then I sent open evidence, like in. In my chat, like, here's the section in the package insert that, whatever. And then it, you know, oh, no, the dosing is blah, blah, blah. And then whenever I asked, why did you give me the wrong answer before? It just ignored the question it wouldn't answer. And that happened more than once for me with Open Evidence. So I stopped using it because I was like, if I, if, if I have to double check every answer, I'm not. I'm not going to be using this. LLM, it's a. And I haven't had that happen in any meaningful way with Perplexity or Gemini or Grok. Those tend to be the, the three that I use. And the, the challenge that I see for you for a company like yours. And, and this is really where I'm getting to with all this. I can literally see week to week, month to month, even when there's not like a big model upgrade. ChatGPT. I don't use chat. But, but the other ones getting better week by week, month by month by month. The, the, the, the amount of resources they're pouring into trying to make the models better. Like you guys can't, like you're like the underlying, like, so how, how do you keep up? So let's say you're better than ChatGPT right now. Six months from now, are you still going to be better than ChatGPT?
B
We're going to be even better than we are how much like the delta is going to increase of how much we get better. And the difference is we understand the workflow. So this is the crazy thing. Now that you see people with like two, three individual, we have five. But like you see the stories of like people with two, a business of two. You got some crazy AI product out the model. You don't need the numbers anymore. Because I think the foundational models are so good that we can build upon. The big companies have done the work. This work came out of Google and then OpenAI of course, was the one that really released it. But this was a lot of the Google work where they actually developed this generative, these generative models. So that kind of lift, that kind of build, there's no way two people could have done like that level of build. But what we have now that we can build on top of it, and the cool thing is as those models get better, we can leverage it too. We can make our foundational model, you know, increase. But what we, the layer we have on top are the workflow layers with the agents, which even like a Google cannot understand unless they get, you know, a hundred dermatologists in a room and really deeply talk to them, which of course they, they're able to build it if they get that kind of inside info.
A
But even then, unless you had those hundred dermatologists in a room for six months, you're not, for six months you're.
B
Not going to get what we intuitively know. So it's, it's actually, I think it's the time for physicians to build because what we, it comes easy to us. So we're like, is that a big deal? But it is a big deal because other people have no idea what we're talking about. It's just things that are coming easy for us. But these, the workflow layer and the agentic workflow layer.
A
Hold on one second.
B
Is what makes us better.
A
Let me make sure I understand your answer. So it basically you're saying that, that improvement that I'm seeing in the, you know, just the basic LLM models, you guys are also benefiting from that. And on and on top of it, you're benefiting from sort of optimizing workflow.
B
And time and iterating. Yep. Like we're even way better than we were two years ago because we learn and we iterate. So we're getting better in our workflows and oh, let's actually change your agent this way because our colleague from this state, like you said, sent in this answer and said this was totally dumb. Change this this way. So we're iterating on that end and making the agents better. And you can. And we constantly upgrade the foundational model so if there's a new update that's available, that's an easy one to do. So I think that it's really democratized what people can build. Actually on top of the foundational models.
A
Is the holy grail for you guys integrating directly with EMRs. Because that's the. If I've got to click in and out of my EMR copy and paste like for it to become proactive rather than reactive and really up the functionality, it's got to be. I've seen it seems like it has to be integrated directly with my emr. Am I thinking about that the right.
B
Way that that actually is? That would be like the holy grail, I think for the user because it's very annoying to have to go from between one thing to another. But actually I would say as far as AI building, we can be so far ahead. But as far as, if you're talking about like let's say an integration with epic, then you actually do need like a huge team because that those things are actually a big lift technologically. Not because it's a difficult thing to do, but because the models don't quite talk to each other. And models like EPIC are a little older, they're built on multiple layers. So for two groups to come and just understand this interoperability is actually really, really hard. This is like these are problems that big groups are trying to solve. Medicare right now has put in, CMS has put in that, you know, most of its payers or payees should be using electronic prior auths, but they can't figure out intraoperability. These are really hard things to do. Actually the infrastructure of healthcare is really hard to connect to one another. So that I would say is like that's the old school hard stuff to do. But from the outside R D development of AI, if, if you have the right problem solving skills and you have the expertise. You can actually out outpace the foundational models. But yes, 100 ideally everything's in one place. You don't have a tool for this, a tool for that. So hopefully it'll get there at some point.
D
Our UNC EPIC actually does have built in AI. I mean in addition to scribing, it will. If a patient. If you say my skin is red from the cream that you gave me for acne, it'll write back. Hi Matthew, thank you so much. I'm sure that's frustrating that your skin is red. That might be your tretinoin. Some suggestions would be. And they're not. It's not like super high quality answers but you have to. But it is trying to do it, which is great.
B
And. And it might make it a little easier for you. Right. Maybe you don't have to like type out tretinoin, which is. Was a win in itself.
D
I don't know. Yeah. It does not go directly out. It comes to me and it says start with draft or write my own. So I could start with their draft and then change it.
A
Cool. Patton, what were you going to say?
C
That was a question I was going to ask. I was also going to ask Farah, is there. Do you have an example of something that like maybe a year ago during GPT was just like terrible at like oh my gosh, like with this needs to be so much better. And can you talk us through the process of how you made that happen?
B
Yeah, I would say our agents, we've actually put out more in the last year because one of the things was our answers were better just the way our model was more, a little bit more curated, but it still had errors every once in a while. So what we wanted to do is basically have it be 100% correct before to put out an answer. But then sometimes it would not put out an answer which is also not helpful. So it was this, this kind of just this balance of having it give a really good answer most of the time and try to give an answer but have it be correct. Which, which is kind of what we've really put out. One of our engineers, he actually has white papers on this I think has a patent on this model with a three tier agent system where the agents checking one another is probably one of the highest ways you can get to almost zero hallucinations or 100% correctness that we can, we can strive to, like we said too, maybe even better than humans. And we're not getting to 100%. But it can imagine if we had maybe four attendings in a room talking and checking each other that might come up with a better answer than just one person alone, which is kind of what the agents are doing. So I think the agentic workflows and basically what that means is we've given like a different job to each agent. So it's basically saying, you are an agent for this, I want you to do this. So I'd say like the Durham Guardian is an agent. You're going to go through every response and just make sure nobody put anything that puts them in harm's way that they didn't miss like a number one diagnosis. Another thing we've gotten a lot better at is we were kind of following the same as everyone else where you'd get that laundry list of a differential diagnosis. But it wasn't helpful. Like you really want it to be like a colleague when you're talking to someone, what do you think this is? What should I do next? Like those are the helpful things. Not just like here's 10 things that could be like that's useless, right? So that we've gotten a lot better at too is, you know what, this is probably this, but it could also be this. But make sure you don't miss this and kind of guiding you like a colleague. So those are a few, few things we're tightening up on.
A
What's your bit? What's your most common use case? So now that you guys have a bunch of things, you know you've got all the different buttons like help with the prior auth. What are people using derm GPT for the most?
B
I think the board certified derms are using it most for the second consult actually like your buddy consult one where they ask a question like hey, I have this and this and you know, what do you think it is? Our one use case we actually didn't really quite think about was the nurse level. The nurse triage level. Which is why we actually ended up adding that RN triage agent. Because I think in the offices what we were trying to do is 10x the derm. Like 10x what you can do in a day. But we ended up finding out in an office everybody wants to also 10x the nurse and the MA. So then we have agents for them. There was once like, I think was a year and a half ago where we were doing an update and the system went down and literally some of the nurses from one of our nearby academics centers called us and we're like, where's Derm GPT. We use it for a daily triage, which was really cool. We're like, whoa, we didn't even know this existed. So we literally built an agent specifically for them. But that's probably one that's used more by rns, actually, rather than derms. You know, triaging cases. And then the prior auth. 1. Prior auth denial generator, that's just like a super high yield one randomly.
C
The.
B
The biopsy site one is not. It's not like, one of the top ones, but it's used. Used frequently enough. Like, you kind of put in, like, left cheek near the eye, and it gives you a fancier version of that. And I always tell all my meds, my med students and everybody, like, going on a rotation, like, just use this. You know, make your anatomic site a little bit fancier. I mean, I've gotten really dumb over years, too, but I'm a medical derm. I'm not like a. I'm not a Mohs surgeon. So the terminology's went gone downhill. And then last year, year and a half. It's funny when one of our most surgeons joked with. It's just, like, fair. You're. You're. You know, you're getting better. I'm like, that's all term.
D
Okay.
B
She was getting a lot of, like, left cheek and things like that there for a while, so.
A
All right, well, we are going to wrap it up there, and it's time to move to Patton's trivia. I'm fascinated to see what you've got this week, Patton.
C
It's just AI stuff from, like, history and movies and blah, blah, blah, you know?
A
All right, before you start, you know what drives me nuts? Well, it might be part of your question, so I'm not.
D
I.
A
Go ahead.
C
I'll. At the end, I'll say, I can't wait. All right, number one, what is the name of the test proposed in 1950 to determine whether a machine can exhibit intelligent behavior indistinguishable from that of a human?
A
The Turing Test.
C
Yep. The Turing Test.
A
That's what I was going to say. So I used to. So I'm. I follow, like, all of this stuff in the feeds and for years, ever. The Turing Test. The Turing Test. The turing. The day ChatGPT hit. Nobody's talked about the Turing Test ever.
D
Again because, well, thank goodness you had a chance to pull that knowledge out and impress people with it. It's good.
C
I'm impressed.
A
It's such an idea of the like, because the Turing Test specifically was a person can interact with it. And if you ask, is that a person or a machine? They can't tell. That was the Turing Test. And as soon as GPT came out and it was passed, everybody immediately stopped up. Nope. Just the moving. Goalposts were crazy.
C
Yeah. So I asked a couple of AIs. I'm like, has any AI, like, passed an unrestricted Turing Test with expert judges basically seeking out to find who's human and who's. And the two AI. I asked Grok in Perplexity, and they said, no, there's not been an AI that has passed, like, what they call an unrestricted Turing Test with expert judges. But, again, I'm asking AI that, and maybe that's what AI wants me to think. So I just can't trust anything now. I can't.
A
Okay.
C
All right. Yeah. So. Named after Alan Turing, by the way. Who? Well, whatever. All right.
A
He cracked the Enigma code.
B
Code, yeah.
C
Then there was that movie with Benedict Cumberbatch called. Oh, what the hell was that called?
A
I can't remember. Beautiful.
D
I actually saw that movie, but I don't remember what it was called.
C
Yeah. Neither here nor there. All right. Number two. In the Alien franchise, what name is given to the onboard computer that controls all of the ship's functions?
B
Oh.
C
There'S lots of Alien movies to choose from.
B
From.
C
There's, like, 20 now.
A
Right. Hal is, like, the only. I know that's. That's not from Alien.
C
Correct. It's Mother.
D
Mother. That's it.
B
Yeah.
D
I was like, there's.
C
Yeah, it's like mu th U R like that stands for. They never explained what that stood for, but everyone just called it Mother.
D
That's what my kids call me, so I find it very endearing.
C
It's good, right? Much. Much like HAL 9000, it has secret directives which kind of make the crew expendable, which is also, like, you're raising your family.
B
Pretty much. Yeah.
C
So, lesson in space, don't trust AIs. I think on Earth here, it's okay. All right, final question. The Voight Camp test was administered to suspects to identify if they were human or replicants. In what movie?
A
The Body Snatchers.
C
No, it's, like, from the 80s. Harrison Ford was in it.
A
Is it V?
B
Oh.
A
Who was in it?
C
Harrison Ford.
A
Replicant movie.
C
They did, like, a sequel relatively recently. Is basically the same name with a number.
A
Blade Runner.
C
There you go.
A
Has to. Yeah. Blade Runner. Yeah.
C
All right.
D
Yeah.
A
Okay.
C
I think Matt walked away with that one. Congratulations, Dr. Zay.
A
Nerdiness. Coming out there we go a lot.
D
Of time in front of the computer.
A
That's right. And all of it academic. Farrah, thank you for coming on and joining us.
B
Thank you.
A
This was a really fun discussion, and I want to thank all of our listeners for coming on with us. We hope you learned a few things. We hope you laughed once or twice. But mostly we're hoping you're planning to join us next week. And until then, for Derms on Drugs, I'm Matt Zyrus.
C
I'm Tim Patton.
D
And I'm Laura Farris. And we are Derms on Drugs.
Derms on Drugs, Scholars in Medicine
Episode Date: January 23, 2026
This episode of "Derms on Drugs" dives into the transformative impact of artificial intelligence (AI) on dermatology practice and healthcare at large. Hosts Dr. Matt Zirwas, Dr. Laura Ferris, and Dr. Tim Patton welcome Dr. Farah Kmanger, founder of DermGPT, to discuss recent research, practical use cases, and professional anxieties around whether AI might replace—or simply augment—the dermatologist's role.
“ChatGPT-4 kind of crushed the dermatologist. Patients rated the quality, empathy and satisfaction...higher…88% of the time patients said, I like the ChatGPT-4 response better. So...I’ll be out of jobs pretty soon. And I for one, welcome our new chatbot overlords.”
— Tim Patton (03:26)
“It’s intuitive to deal with it. It's language-based and language is how we communicate… these large language models, the reason they're beating out the doctors, they're created to just be super nice, super customer servicey.”
— Farah Kmanger (06:29)
AI excels at structured communication tasks and longitudinal tracking (e.g., tracking moles via digital dermoscopy).
Image-based diagnostic tools are improving rapidly. Devices can now photograph and analyze every mole on the body in minutes.
Dr. Ferris:
“Longitudinal follow up of that mole...if that person comes back in six months and it takes another photo...there's no way I'm going to remember what that mole looked like. And even our photography currently is not that good.”
(12:29)
Over-diagnosis is discussed—AI may help standardize what really needs attention, and allow dermatologists to focus more on complex or procedural cases.
Study from Mayo Clinic: AI-created video avatars of doctors provided post-op patient counseling; patients reported high levels of trust, engagement, and little “eeriness."
Acceptability and trust reached 90%.
Quote:
“They could now create a digital twin of all four of us… the LLMs will be able to tell the digital twin what to say and patients won't be able to tell it's not us… It is better than us [at empathy].”
— Matt Zirwas (17:16)
AI avatars and “digital twin” doctors open up 24/7 patient interactions and potentially outperform real doctors on measures of patience and warmth.
Dr. Ferris points out more immediate uses like automated counseling and scheduling.
“DERMGPT answers were preferred...but ChatGPT’s references were more frequently preferred...So basically there were more tight, to the point, better answers out of DermGPT.”
— Laura Ferris (24:40)
“We have these models that check each other...probably one of the highest ways you can get to almost zero hallucinations…like having four attendings in a room checking each other.”
— Farah Kmanger (48:26)
Significant time saved on “pajama time” and administrative tasks.
True integration with EMRs remains a technological and institutional challenge.
Looking Forward:
On AI's friendliness:
“They’re created to just be super nice, like super customer servicey. They’re created for you to want to keep engaging with them.”
— Farah Kmanger (06:48)
On medicine’s trust paradox:
“...People hold self-driving up to a standard of perfection, whereas the right standard is 1% better than humans... Doctors are worse.”
— Matt Zirwas (09:49)
Derailing old fears:
“I think we’re still okay for just a little while though.”
— Farah Kmanger (08:06)
Physician's new role:
“I do actually think derms are safe because we're so procedurally based. ...More of our time shifts to doing procedures. ...I think that's ten years out before that happens.”
— Matt Zirwas (11:23)
On reference formatting wars:
“We should put some effort into making [DermGPT’s references] look pretty to...”
— Farah Kmanger (27:18)
The take-home: AI is already outperforming physicians in some aspects of communication and knowledge delivery, especially for routine queries. Both patients and doctors recognize these strengths but aren’t quite ready to give up the human touch. Specialty-focused AI models like DermGPT are carving their niche by curating medical knowledge, reducing administrative burdens, and integrating tailored workflows. While full replacement is still years away, AI will keep expanding in scope—reshaping dermatology and, potentially, redefining what it means to be a doctor in an increasingly digital world.
Memorable Closing Exchange:
“We’re all going to be sitting at home with our humanoid robots taking care of us and maybe doing podcasts so that we have something to do.”
— Matt Zirwas (14:31)
A lighthearted round of trivia covers the Turing Test, AI in films (“Alien” and “Blade Runner”), with the hosts riffing on how fast conversation about AI has moved beyond simple benchmarks—marking the relentless acceleration of the field.
Finished Listening? The future isn’t coming for your derm job tomorrow, but it’s time to put AI to work for you—before it works instead of you.