Loading summary
Caleb Hicks
Teachers and school leaders realizing, hey, this helps me in my job.
Zach Lipton
We see doctors saving as much as an hour or more a day vibe.
Lee Robinson
Coding is this idea of it's never been easier to create prototypes.
Danny Grant
I think we just saw a whole new way of experiencing the web.
Host
Hello and welcome to the OpenAI podcast where we're live from OpenAI Dev Day. Here sitting with me from school AI is Caleb Hicks. Caleb, hello.
Zach Lipton
Hi.
Caleb Hicks
Thanks for having me. This will be fun.
Host
So Caleb, you are working on tools for helping educators and helping people basically in the classroom understand progress of students.
Caleb Hicks
That's right, yeah.
Host
So first off, what was your reaction so far to Dev Day?
Caleb Hicks
A ton of fun I think. Makes it a lot of things to be excited about that help us build but also help students and teachers be more creative as well. So that'll be fun.
Host
So what have you been working on over the last year? What has changed with AI that's accelerated what you've been doing?
Caleb Hicks
Oh, I think probably the biggest advancement over the last year for us. So we, we put AI in students hands. That's the main, the main thing that we focus on is safe managed AI that can act as kind of one time personal tutors for students. And so probably the biggest challenge change from OpenAI has been model progression. I think we get two advantages from that. One is significant leaps in intelligence and the other one is, you know, improvements in cost. Because we are working with an industry that isn't known for paying big dollars for software, it's been important for us to be able to manage students using this in a cost effective way. So those have been the two areas that AI progression has helped from our work. It has been a lot of orchestration which I'm sure we'll talk about a little bit. Just getting different AI agents and models to work together for the best outputs for students in particular.
Host
So a couple of the releases we saw today, one was the agent's SDK and you've talked about that. How much have one, tools changed the ability of one, to work faster and two, the scope of what you find is capable now?
Caleb Hicks
Yeah, I think we're seeing teams across industries work way faster and building better software because they've got kind of this always on expert, hyper, senior engineer next to them that they're pair programming with.
Zach Lipton
Right.
Caleb Hicks
So we see that with our teams as well. And that just allows us to build better software faster and get it in the hands of teachers and students, which is what we're here to do.
Host
What has been the biggest shift you've Seen in talking to educators or people like that in regards to AI in general.
Caleb Hicks
Yeah, great question. So every teacher, school and district is on a very similar journey. It starts with permission.
Host
Right.
Caleb Hicks
Two and a half years ago, it was everyone under the sun was just banning AI altogether. We've, we've moved past that into productivity. Hey, this helps me in my job. The leading schools are starting to get into that really important spot, which is recognizing that every student has to know how to use this stuff. If you're going into. If you're graduating from high school and you're competing for colleges or jobs and you don't know how to use AI yourself, you're at a severe disadvantage. So most people are now orienting to like, yeah, we have to teach this. We think there's a couple of special steps beyond those two where you get to better support students. The AI tutor in every kid's pocket.
Zach Lipton
Right.
Caleb Hicks
But we think that is. It's got to be classroom connected. It's got to know what you're doing in class and it's got to know where you're trying to go and help you move that direction. That last step we get really excited about is how you can put AI to work with teachers, families at home, school leaders, kind of the system at large to really make school awesome for students.
Host
Could you tell me a little bit about the kind of stack you're using from both the teacher facing, student facing back end, or what you're working on in that regard?
Zach Lipton
Yeah.
Caleb Hicks
From the product side, we have kind of three different parts of the product that students, teachers and school leaders use. So there's just a pretty basic AI assistant.
Zach Lipton
Right.
Caleb Hicks
The GPT wrapper, as they say, but tuned to use cases in schools. I think a big thing that we felt like was really important is teachers should never have to become prompt engineers.
Host
Right. You know, I was a prompt engineer.
Caleb Hicks
Yeah. So we do a lot of that kind of extra orchestration to. We essentially enrich every prompt that a teacher writes to get better output for them for what they're teaching their grade level, all that stuff. So there's that. We call that.it's fun little blue animated character. And then we have tools. It's a form, you fill it out and it gives you an output, a lesson plan, adapted reading content, things like that. Those, we like to call those the 101, like the checkers level features, that kind of table stakes that you've got to give to teachers for them to move from. Can I even allow this to. This is useful for me. But the special part is when you start doing kind of these one time guard railed, safe managed AI tutors that the teacher can create give to their students, students are interacting and then the teacher gets a real time dashboard of how the students are doing, what they're doing with the AI. And to make that very concrete, the last five, 10 minutes of class, a teacher may give what's called an exit ticket to their students. So they've got a recording of everything that they did during class that day and it loads up and says like, hey, how'd the content go today? And it does almost a, what's called a formative quiz. It asks them questions and then it's coaching them on the, on, on what they learned, what they want to learn, where they might go next, tease them up for whatever homework they might have. And then it will do just kind of an emotion, kind of a social, emotional check in. It will say like, hey, how was class? What feedback do you have for the teacher? What are you looking for out of using this content and rolling all of that up to the teacher so that they know how to better support those students in the future? I think a lot of us don't recognize about teachers is they might be working with 300 students at a time. I had 42 desks in my classroom.
Host
So you started off as a teacher before all of this, that informed a lot of what you're doing?
Caleb Hicks
Yeah, So I had 42 desks in my classroom and I was teaching seven or eight periods a day every day. Teachers have to make this impossible choice. Do I work with the top 10% of students who love this? They get it, they want more of it. They'd stay after school if I let them. The 10% of students that are really struggling, maybe that's because they don't understand it. Maybe it's a learning disability, maybe it's a problem at home. Maybe they got bullied in the last class or everyone else. And I care a lot about that everyone else, that middle 80% because I was one of those students and most of us were. Definitionally, we think about what we're building and what we've been able to build with, with OpenAI and some of the tools that we're announced today we'll be able to do even more of is give teachers almost a GPS for impact. Like these four students really need you today. And you can jump in and support those four students in a way that you maybe wouldn't have even known that they had a concern.
Host
That brings up a very, I Think interesting point. When I talk to developers, often people confuse the tools for the product. And what you have to do is you have to both understand the needs of the educator that comes from you working in the classroom and your peers working at school. AI and understanding that. And I think that's the thing you're able to bring to it. Where you look at this is a platform to build on top of it, and that's something where you've identified all these areas that you can bring into to specialize it and make it very custom what you're doing.
Caleb Hicks
Yeah, I think that's something exciting. We saw today in the announcements was opportunities for people like me with subject matter expertise that really know a domain and we're able to build kind of with you all with OpenAI, not just on top of OpenAI, which is a really cool unlock for a lot of people.
Host
What are you most excited about that you saw today?
Caleb Hicks
The agent builder.
Host
Agent builder, yeah. Yeah, that looks like going to be a lot of fun and going back to when you know how it was when we first started, you had to just wire up a lot of that yourself and code tools certainly make that easier. But just being able to drag and drop something like file search, the permission structure seemed really well thought out and particularly what you're doing in a classroom situation where you really have to have those safeguards in there.
Caleb Hicks
Yeah, we actually built our own and have for a number of years and I think we're excited to go and get our hands on it and see what do we get to get rid of, what do we get to make even better now that we have access to it ourselves or from you all. I think that's either way we're going to learn a lot from working with it and that makes it easier for us to use. But also again, bring it into the hands of these people that aren't technical, they're not developers, they're not thinking about, you know, which model they're using, but they just want to get their stuff done.
Host
Yeah, I think that's a great point in that the more you can have the people who understand who your customers, in this case students and teachers, understand what that problem is. Unless time they're spending time to wire things up and run fast API servers and stuff to do this seems like it can make the product get a lot better, faster.
Caleb Hicks
Absolutely.
Host
What else have you been excited about? You've seen.
Caleb Hicks
I think we've built some similar things in our product. We have this concept called Power Ups, which are basically apps that you can use with dot our.
Lee Robinson
Our character.
Caleb Hicks
I think we got to see some good patterns and examples of that today with the new apps that were. That were announced. And I think one of the things we're really excited about is these partners that we've been developing that everyone is really solidified around MCP servers as a way to communicate with AI. And so a thing that will definitely benefit us is OpenAI kind of drawing a line in the sand saying, we're doubling down on this.
Zach Lipton
Yeah.
Caleb Hicks
And so now when we go to a partner that's already building for integration with ChatGPT, they can also bring that directly into school AI again for that safe, managed, guardrailed experience that teachers and school leaders are looking for.
Lee Robinson
Yeah.
Host
One of the things I think it's going to be helpful too, and you mentioned this before too, is evals and the ability that as you run these systems and you run the models, to be able to know how well they're performing. Because 2 or 3% may not seem like a lot, but can make all the difference, particularly with a student.
Caleb Hicks
When you have 5 million students using your platform, 2 to 3% means a whole ton of issues every day.
Host
And it's one of those things that's. Often every company I talk to knows it's important they want to do that. But to have the opportunity and the time to spend building an eval suite, to build something in there is just often kind of a thing that we'll do in the future. But having that built into a system.
Caleb Hicks
So seems like we kind of saw it today. Right. I don't have time to do the evals.
Host
Yeah, exactly. So they did an eight minute.
Zach Lipton
Eight minutes.
Caleb Hicks
Building an agent. Very impressive.
Host
Yeah.
Caleb Hicks
Take 10 and do the evals and that'll be fun.
Host
Yeah. And that's. It's kind of exciting too, because for prototyping and spinning things out, because I think a lot of what you're doing is probably going to be experimental and trying to think with teachers to be able to sort of see if these things work. And do you see that accelerating?
Caleb Hicks
Yeah. So we. One of the things we do, teachers, when they're creating these custom AI tutors, they do them lesson by lesson. It's like, hey, I'm teaching about the water cycle and I'm going to create this activity that starts out as a tutor and then turns into a game and then turns into a quiz. Just being able to preview that faster. But almost having adaptive evals in the moment is one of the things that we've gotten into, is really the meta prompting. How do we not just tell the AI what to do, but tell it how to do the thing that it decides to do?
Host
Right. Yeah. It's a very interesting space to be in where the AI can help you build the prompt to tell it what to do. But knowing where you're going to start from has been super helpful. Where can we look forward to finding out more about school AI, what you're working on?
Caleb Hicks
Yeah, you can follow us on X. Get SchoolAI. You can check out SchoolAI.com I think this is a fun time of year with back to school and seeing a ton of how teachers and students are building different holiday stuff. That gets really fun when teachers are sharing with each other. So yeah, I would say on X or instagram or schoolai.com Last question, any
Host
advice you give to developers about where to start with the tools, what you've played with so far?
Caleb Hicks
Oh, I think you can probably just start in the, in like the GPT builder for most of the ideas that I hear is just start with the GPT builder and then expand from there. You've got, you know, in the developer portal you've got a ton of other tools to just start playing and then I think the, the agent builder that we saw today is going to be another fun one to start connecting the dots between all the different tools.
Host
Awesome. Thank you so much.
Caleb Hicks
Yeah, thank you Caleb.
Host
Hope you enjoy the rest of dev day. So we're going to be talking to a few other developers about some of their different experiences and I'm always fascinated to see what it gets accelerated and how they're able to focus when a new tool comes out on really what is their core thing they're working on. So up next we have Danny Grant who is with jam.dev which is an in browser tool for helping you evaluate your site and to figure out basically how to improve it.
Danny Grant
Is that I'm so happy to be here. And actually later today on stage we're going to announce a brand new tool.
Host
Oh boy.
Danny Grant
It lets any PM designer, marketer. Oh, I'm a little short.
Host
No, no, no, Micro's too tall. It's all good.
Danny Grant
I agree. Well, just like you just did, it helps anyone fix what's broken instantly without writing code. It's called please fix.
Host
Okay, so I would use this like my website, go into it and say just please fix or yeah, I mean
Danny Grant
like yesterday before this tool was announced, if you wanted to change like some copy on your site or like a, like a button didn't look good or Missed a hover state. You'd have to probably ask an engineer to do that for you. And the engineer.
Host
Nobody wants to do that. No.
Danny Grant
Okay. And then they'd be like, can you make a ticket? You're like, okay. You know, and then the ticket gets prioritized, and then, like, maybe they get to it.
Host
I worked for a company I won't name it, where we joked it was easier to release a model than put. Put up a blog post.
Danny Grant
That's hilarious.
Host
Yeah, we'll name it.
Danny Grant
So now, as of today, what you can do is while you're looking at your site, you click on the Please fix browser extension, and then it lets you edit your site right there like it's a Google Doc, like it's a Figma. And then when you're ready and you like how it looks, you just click submit. And it creates the PR for you and it uses your design system. So engineers like the PRs, they're very clean, they're very tight, very cool.
Host
What were you most excited about, what you saw today?
Danny Grant
I mean, I think we just saw a new way to browse the web. Like, I think OpenAI maybe just changed what we mean by the web, what we mean by a browser. Like, if you think, like, web one is read, Web two is read, write, okay, there's web three, and then maybe this is like, web four, like, read, write, think. I think we just saw a whole new way of experiencing the web where it's a lot less mechanical and it's a lot more stream of consciousness.
Host
So you talk about the apps inside ChatGPT.
Danny Grant
It's freaking cool.
Host
Yeah, it is. It makes you think a lot about when you're building something, what the core functionality is, like, what the purpose of it is, and the idea that if it's going to be presented inside ChatGPT. And so watching the demo where they interacted with the Zillow website and actually having to have data about that and be able to drill down into it, and it's exciting to think about the possibilities there. And so I could see for you all that seems like a pretty interesting area, because people, as they focus on usability, they're going to want to make sure those things work really well.
Danny Grant
Yeah. Like now, when you build Your Inside of ChatGPT app, if you're the PM, if you're the designer and you want to tweak some things and make the app look a little nicer, you can just use our browser extension, change it from the ChatGPT interface, and make the PR in GitHub it's so cool. And I think these tiny tweaks, they often don't get prioritized today. But they should, because the difference between a fine designed product and a well designed product is that the well designed product changes the world. The iPhone changes the world because it's usable. And there were actually attempts before that that I won't name that many people under the age of 30 have never heard of because they just, they didn't have that attention to detail.
Host
Have you seen your development process change with tools that were coming out, particularly AI tools?
Danny Grant
Yeah, I mean, I'm biased. We've been using Please Fix as we've been building Please Fix. And even today our pricing page changed dramatically because a PM could just go in and test a bunch of stuff without asking an engineer. I think the thing I'm most excited about is this idea that you can move as fast as your entire creative team altogether. Like no disruptions for engineers. It's like people can move without having to have bottlenecks on a few people.
Host
So it seems kind of cool because what you're talking about is the idea that people who aren't specifically engineer background are able to make those changes. And has that been something you've seen basically within your company, the adoption of tools like Vibe, quitting and stuff like this for people to contribute and be able to come up with ideas?
Danny Grant
Yeah, it's really cool. We talk to our users every day and some of the stories we hear are awesome. Like last week we talked to a user who's a firefighter and building software for firefighters. That's cool. Talked to a user a couple weeks ago who grew up in the church system and is building software for churches, has no software experience, but is able to now make something that's very impactful to their community. I think that is awesome. I think we are about to see the Cambrian Explosion of software the same way that with the web, there's the Cambrian explosion of news sources with Substack and Twitter. I think this is about to happen for software and I think it's one of the best things for humanity.
Host
So how do things work@jam.dev as far as ideation, testing things with customers, releasing and then figuring out what's best features and whatnot.
Danny Grant
The only thing we care about is does this deliver a wow for our users when they use it. And so that's what we're focused on. There's an emotional response we want people to have when they use our product because we work on the worst part of Software development. Like, I don't know anyone who is like, today I want to fix some bugs. And so our job is just to make that whole part of the software experience a lot better. And so we're laser focused on that. We're just constantly talking to users. And every user that signs up from JAM hears from a co founder. Every user who uses JAM hears from a pm. We are just in constant contact.
Host
When I worked at OpenAI, one of the most exciting things, and it sounds absolutely silly today, was when we watched GPT3 spit out a react button that was like, oh my gosh, I could do it.
Danny Grant
I remember that. I don't remember it inside, but I remember from the outside. I remember seeing demos of that being like the world just changed.
Host
That was the threshold for us Being impressed was literally like four lines of code. And to be able to, oh, that's really cool, it can do that. What has been a big aha moment
Danny Grant
for you, I think, look, it's similar to what you're saying, but when you think of what you just described coupled with what was just announced with the Apps SDK, you can imagine that in the future, yes, humans are going to be building a lot of apps that are shown dynamically as they browse the web. But also I think that agents will dynamically build apps for you as you browse the web. And that opens up a whole world of possibilities because you can have dynamic software right there. Like imagine inside of an organization right now, if a PM wants to see how a product is doing, they have to build a dashboard and it's a lot faster to do that today than even six months ago. But imagine the PM needs a dashboard and they're in ChatGPT and ChatGPT just gives it to them. And no human had to do that. It means that there are two types of software. There's sort of long term software that humans are going to work on. They're going to really fine tune, make it great, like Zillow, Canva, and then there's going to be sort of disposable one time use software that an agent can just whip up. And that is just freaking cool.
Host
Yeah, it's a thing where as a developer, you're used to sometimes spinning up a tool one time to use that and then you're done with it. But it's another thing to think about that could be a new modality. We get kind of fixated in sort of like the App Store sort of idea. And I think that there's going to be like, you Said long term stuff that you deploy inside of ChatGPT, but then the ability for somebody just to spin up a thing they're only going to use once and that's the end of it. Yeah. Have you seen as far as things that start off as hobby projects or things like that, or maybe inside jamf.dev, people said, Hey, I had this idea I wanted to solve for this thing, maybe even something from finance or comms or something that's turned out to be something useful.
Danny Grant
Yesterday we were at the rehearsal for our 10x code base session. It's four startups we're all demoing and we're just sitting around kind of backstage talking and we're like, oh, how do you start your startup? And almost everyone, three out of four, I'm the only one where their startup started as an internal tool at their company that they needed. And actually these startups pivoted to the internal tool because they found it so tremendously valuable to how they build software,
Host
how Slack got its start and other companies. It's a very interesting thing where the thing you spend the most time on is often the thing that's going to be the best product. And you know, one of the things they talked about OpenAI here today was like how 70% of the codes coming from, you know, the PRs are coming from, you know, generated by codecs. And it's a very interesting thing to see when a company is using their own tool to build the tool, it seems to iterate much faster.
Danny Grant
It's actually so funny because if you listen to standard startup advice like what would Paul Graham tell you? It's hey, don't optimize the internal processes like just do things that don't scale, do them poorly. But actually it's when you take a lot of care in your own processes that you can develop these products that can be their own companies and can help a lot of people.
Host
Hey, it might be that that is going to be the advantage is how quickly internally you make something. Because I think you're going to see a leveling effect with tools like the agent kit and everything else. Because it's going to be probably having a big technical bench isn't going to matter as much as having good product depth or understanding of your customer.
Danny Grant
Yeah, at the end of the day a human has to use the software and if a human has to use it, it has to be easy for the human to use it. I do think design still today and maybe forever makes or makes a difference.
Host
So you used jam.dev on the jam.dev site.
Lee Robinson
Yeah.
Danny Grant
Yeah. And if you want to see what's brand new, go to jam.devpleasefix.
Host
okay. How often do you guys making updates on your own site?
Danny Grant
Right now we have an update.
Host
You did your pricing page. You did that. So yeah. Too often does that become sort of thing like Wolf, it's so easy to change it.
Danny Grant
Yeah. We just added dark mode, we added pricing page, we do cop updates. Yeah. Now it's too easy.
Host
Is there going to be like a please fix?
Danny Grant
Wait, you know, oh, please fix with a polite delay.
Host
Yeah. Or like maybe think about this. Yeah.
Danny Grant
We should add as a feature for engineers that if you made too many changes, it like arbitrarily waits a few days and then asks for a change again.
Host
Do you really want this? Yeah.
Zach Lipton
Yeah.
Host
What are you looking forward to? What kind of tools would you like to see that don't exist yet?
Danny Grant
Well, I was talking to the engineer sitting around me during the keynote and they were really excited about the optimizer and the evals as part of the agent kit. What they wanted to see was, well, what if the evals could be written automatically for you using your own data and then use that to automatically optimize your prompt. So rather than people sort of sweating the details about these agents, what if the agents could improve themselves? And I think if we had that at jam, we could move a lot faster and we can make a lot more powerful software that a lot sooner.
Host
What advice do you have to founders or developers right now?
Danny Grant
It's never been a more fun time to build. I think we all just get to enjoy it.
Host
So how did you figure out that this what your team wanted to work on with jam.dev. how did you decide this was the problem you wanted to solve?
Danny Grant
When my co founder and I were product managers together at Cloudflare, we worked on the fastest moving team in the company. It's this sort of Skunk Works team that would try to do big new things. It's the team that shipped Cloudflare workers their Cloud Compute Platform. It shipped 1.1.1.1, which now fields a trillion DNS queries a month. And it was just trying to move fast on this team that we realized a lot of the frustrations and bottlenecks come from just reproducing issues. And there's no tooling out there and no user, no pm, no one in sales knows how to communicate to an engineer what the engineer needs to know. And we're like, we can't believe that there's no way to help the engineer get things done faster. We're spending all this great brain time on communicating bugs and hopping in calls and screen sharing and not enough on fixing them. And we thought that we can solve it.
Host
Seems like a lot of things sort of in front of you as far as what you can do. How do you decide what you're going to do next?
Danny Grant
Okay, I think there's something that we under talk about in like startup world, which is if your startup works out to your wildest dreams, you're going to work on it for like 10 years. And that's a really long time. And so you better just love the problem because that makes it really, really fun. And so I think you work on the thing where if you get to wake up every day, work on it and talk to users of the thing, you're going to be pretty darn happy about that.
Host
That's pretty awesome. How can people get started?
Danny Grant
Jam.dev.jam.dev PleaseFix and we'll please fix some code for you.
Host
Awesome. Thank you so much.
Danny Grant
Thank you.
Host
So it's interesting to see what each kind of developer, what they're looking forward to, what kind of excites them as far as that. A lot of that's based upon what they've been working on. I know a lot of devs have to build a lot of internal tooling and when you can solve for that with an SDK, it makes your life a lot easier because you can focus on the thing you want to work with. Up next, we have Zach Lipton, who is with Abridge. How you doing, Zach?
Zach Lipton
Not too bad. How you doing?
Host
Fantastic. So Abridge, you're working on tools for helping the medical community. People doing like transcription and that kind of area is that.
Zach Lipton
How would you describe a platform for doctor patient conversations? Like if we were to like rewind to like the state of affairs, you know, post adoption of electronic health records, but pre ambient listening, doctors are spending about two hours doing paperwork for every one hour of direct patient care. So it was this kind of like clerical burden crisis and it was pulling. It was a situation where technology was pulling doctors away from patients rather than bringing them closer. What we do is we provide this platform that helps, kind of gives doctors superpowers, helps them with their paperwork, does all the note taking in the background, preps them everything. So all these kind of documentation artifacts are ready for them the moment the visit's over in exactly the form they need. So it could be fully present with the patient instead of spending all their time staring at the computer.
Host
What kind of metrics do you have so far as far as like time saved for doctors?
Zach Lipton
Sure. Interesting story and difficult to track down in part because there's a time that doctors spend documenting during the day. But the reality of the status quo before was that most doctors actually didn't finish their notes during the workday. So they were home after hours, logging into the EHR and doing what we called pajama time. They're basically sitting there, like pulled away from dinner with their families or logged in after hours, you know, sitting in bed finishing up their notes, our reports. So we cobble together this information from a bunch of different sources, but we see doctors saving as much as an hour or more a day. We see, you know, doctors, some doctors seeing like 10, 15 patients in a day. We're talking about like, you know, five to 10 minutes, often in note taking. So it's a, it's a tremendous kind of relief. But even beyond the actual time saved, oftentimes there's an even larger sort of like perception of burden lift. And that's because the doctor is worrying about less and able to focus on their patient.
Host
Yeah, I mean, that's great because like you, you know, look at an hour a day is either an hour they can spend patients or just not have burnout and, you know, have more focus on that. You know, that's. We're talking to four school AI and looking at how they're basically able to help teachers spend more time in the classroom with students. And that's the same problem you're dealing with here. How can doctors be there for students or student patients? And also.
Zach Lipton
Yeah, so we have a channel in Slack and it's what we call love stories. And it's where we hear like kind of like all the feedback from the field from doctors. And one of the, one of the wild indicators that like we, we had landed on something big. And this is relatively early, maybe like a couple months after we launched the first enterprise pilot at a hospital system was when we started getting stories coming in. And they weren't just talking about the clinical experience, but they were talking about, I actually got to have dinner with my family every night this week for the first time in 10 years or abridged saving my marriage. And that was not like where I was expecting things to land so quickly. But you know that the problem is that big.
Host
What did you see today, Dev day that has you excited?
Zach Lipton
Oh, so many things. Maybe like two that are top of mind. One, a lot of people have already talked about the Agent Developer Kit and I think that's extremely exciting. There's this moment right now where I think everyone, it's sort of a proto discipline, so everyone's been rolling their own tools in the hope of like kind of trying to figure out like what is the paradigm, how does this work work. And there's so many things that have to work together. There's the context engineering, there's the prototyping, there's the sanity checking, there's evaluation, there's also down the road, everything around production and monitoring. So I'm really excited to see where these tools ultimately go. But seeing OpenAI take a strong position and put a kind of comprehensive offering that brings together a lot of these things that people have been rolling their own orchestration tools, their own evaluation platforms, we certainly have, and seeing how much is going to create a common platform and allow us to sort of lift off and focus more on the content is something I'm super excited about. And then in general I've just been extremely excited and drawn a lot of inspiration from all the work that's gone on in terms of software developer tooling. And so we are developing AI powered products for our customers. But we're also big consumers as productivity tools like Codex play a big role for us. And just seeing how far we've come there, I mean I remember. So my background is academia. I was an AI researcher, I am an EI researcher professor at Carnegie Mellon and I did my PhD back in 2010. And I remember when Ilya had a paper, Ilya and Wojciech had a paper called Learning to Execute and it was just having like, you know, the idea of like code going in and like a model kind of anticipated, anticipating output was like such a, the fact that the model is doing anything at all in the space of code was kind of revolutionary. And to see where we are now going from like maybe two years ago having our minds blown by just code completion and right now seeing these like larger like full code based refactors taking place, you know, it's, it's kind of amazing to see the progress in the space and I've been excited to follow along.
Host
So when you work in medical, it's a very high stakes area and so that's got to be with something. You think a lot about how you deal with hallucination and also how you deal with customer concerns about these things.
Zach Lipton
100%.
Host
So what do you look forward to in tools? What have you seen the biggest help
Zach Lipton
in that area that's an area where we've had to develop a lot of our own technology. You know, what is a hallucination? It's kind of like back in the old days of developing simple classifiers, we had false negatives and false positives. And now kind of like everything that's like if it's there and you don't want it, it's a hallucination. And the question is, well, what is a hallucination? Sometimes it's completely confabulated information that is asserting facts about the world that are not real. But in the context of medical note taking, medical documentation, order placement, there's a kind of particular situated notion of what we really mean. It's something that's sort of unlicensed by the sort of surrounding context, the substantiating evidence. Even if it might be true, you know, if a doctor doesn't, if a kind of explanation of a disease, like shows up in a generated patient facing summary that the doctor never said, yeah, you know, that's kind of like, even if some of the information might be like either correct or plausible, like that's not within hours. So we have a kind of bespoke notion of what constitutes a hallucination. But we are able to often draw a lot of inspiration from what can the frontier models already do out of the box? If we define our ontology of these are the types of errors we're concerned with, then we can go and say, what is the ability of an out of the box model, given each documentation sentence a la carte, to correctly designate them as belonging to the right category? When we find, okay, we're already within the realm of like, the model is able to judge even if it's not able to sort of never commit the crime in the first place, it's able to recognize when the crime's been committed. That gives us a sort of like proof of concept. And now what we need to do is make it better, more accurate, cheaper, faster. So ultimately create our own special purpose models that are able to take in parallel every single sentence in all the generated documentation and surrounding artifacts and process for each one. Like sort of, does it contain an error of an unacceptable variety, like of what kind? And then a kind of pipeline downstream for remediating. And we're able to do that with about 97% recall at this point.
Host
So do you have any advice for people who are trying to work on basically developing their own evals for hallucination or just sort of good starting point?
Zach Lipton
I think it all starts with getting really crisp about what you really mean, and I think that's what we've seen before, is that what is a hallucination for us is of lot a a little bit different from what is a hallucination for like a general sort of like open world QA system.
Host
So is there kind of like a boundary which you keep expanding? For instance, we talk about medical say, okay, we feel maybe working inscribing right now is an area that we can probably solve for and produce a pretty good product that's at or better than human level. Then do you look at like there's areas in which you would expand out to as you feel more confident?
Zach Lipton
Absolutely. So for us, we've always bristled a little bit when VCs put up this chart and they're like, these guys over here are the conversation agents and these guys here are the coding startups and these guys are the scribing companies. And we've never liked getting pigeonholed as a scribing company. That's because from the very outset the central thesis wasn't just about scribing. Central thesis was sort of about medical conversations about this being this moment where, you know, this is this magical spot. It's these 15 minutes that the patient waited maybe six months for are where the patient tells their entire story, where the doctor goes through their entire reasoning process and within minutes after it's over, the patient's forgotten 80% of what happened. The doctor is like, you know, hours behind on their note taking. And so for us we've had this feeling that, you know, I've also like as an academic been working in actually applications in healthcare is like my kind of passion area for about a decade and I've been watching so many interesting machine learning ideas get developed as a proof of concept only to kind of sit on the floor and not get used. And so what we realized that the conversation was this way in, it was this important arena where in some ways it's the most important moment in the entire experience of healthcare. And sort of no one was providing value in that moment. And scribing we already knew was going to be a killer application because, you know, it was never gonna scale. Human scribing was never gonna scale to all doctors, but those who could afford it were willing to pay tens of thousands of dollars per doctor per year to have a sort of like offshore ascribe. And so that already kind of told us like there was this clamor for it, there was this need for it, but we could get in. But now that we're In I'd view, like take this broader view of like what, what is like the entire picture rather than sort of being like we're quietly in the background, just like minding our own business and. And then at the very end of the visit, boom, the magic happens. We don't wanna go so far in the other direction that we become interruptive. But from a perspective of just sort of like how do we support a doctor for the entirety of the visit? From sort of their pre charting experience before the patient even comes in the room, through sort of every kind of cue or nudge that they might need during the visit to help them make the best possible decisions, help them tick all the boxes through, make sure that like insurance is going to pre approve the particular test or treatment that they're going to do so the patient doesn't end up wasting like a month waiting for care. So we kind of like zoom out, we kind of see this space of the conversation. Like the point of care is like now that we have ears in the visit and now that you have a sort of AI workforce to sort of do your bidding, what are all these other jobs to be done that could be addressed in the moment? And that includes everything from the sort of pre visit experience through to real time clinical decision support, through all the kind of anticipating and getting in front of all of the sort of like financial related documentation that needs to be done to ensure that the doctor gets paid and that the patient gets their care in a timely fashion?
Host
Yeah, you know, on one end you have people working on, you know, the AI scientists and tools and trying to solve frontier problems. I've talked to other people who've told me that if you can have better intake in hospitals, you might be able to get rid of hepatitis, that there's some very low hanging fruits there. And is that something you've looked into or you sort of see a lot of opportunity there?
Zach Lipton
Absolutely.
Host
Any particular area that you'd like to see the tools get better sooner at? Oh,
Zach Lipton
so many. I'd say on a personal note, they're not very funny yet. Okay. I don't know if you've had this experience of ChatGPT that it's far better at solving hard math problems than it is making a joke.
Host
Sora though is very good at jokes. I don't know if you tried that yet.
Zach Lipton
Yeah, I think there's a tremendous amount of work that one still has to
Host
do to
Zach Lipton
crisply define every single task for the model. And I think that just how high models can come and I think when it comes to a crisply defined technical task, they've gotten very high in the abstraction chain about breaking it down. But I think when you get outside and start addressing problems that interacting with the system, kind of tackling the more world problem you're discussing, you find that you kind of have to do all the driving. And the system is a little bit more of a, it's an information retrieval system. It is, it is the world's knowledge at your fingertips, but it is not kind of connecting dots at a more abstract level. And so I think, you know, in the, in the coming, you know, months and years, I'm excited to see to what extent does the model go from more technical problem solver to
Host
a more
Zach Lipton
independent interlocutor and the like normative side of problem solving.
Host
Going back, how did you all decide this was the space you wanted to start with?
Zach Lipton
You know, I think we saw a few trends that were happening all at once, at once. Like my research background was in deep learning and any one given approach, we kept running into little plateaus here and there. But if you zoom back and look at the ARC from 2012 through to maybe 2018, 19, when the company was founded, you saw there was a path of advances in speech recognition and path and advances in natural language processing that was proceeding, if anything accelerating. And simultaneously there was a crisis around physician burnout that maybe in 2018 wasn't the number one burning priority on the minds of like CMIOS and like hospital system CFOs across the country, but it was, you know, it might have been number five on their priorities and it was like rising up the charts. And so we kind of knew there was this coming crisis of like doctors were burning out, they were spending more and more time on documentation, they were dropping out of med school, they were graduating med school with no intention to practice medicine, leaving to join, to join tech companies, to join pharma, but to do anything but practice. And so there was a churn problem or attention problem. And at the same time, like we kind of knew that there was like the right family of tools were coming into fruition simultaneously. And you know, for us, I think me coming from, I was saying before this kind of academic response perspective, having a lot of us before had maybe operated in the machine learning for healthcare community a little bit on what feels like a cooler, important predictive problem, but without maybe connecting all the dots when it came to what were the priorities of the health system, what were the pain points of the health system, what were really the choke points in care maybe coming A little bit more from what seems like an interesting clinical predictive problem. I think at that moment we had a little bit of flash of insight that we didn't know if our timing would be right. Keep in mind that in 2018, the typical context length for a language model was maybe, I don't know, 256 words. And these conversations are like 4,000 to 8,000 words on median. But I think we just saw a lot of those convergence of a few trends that all spelled that there was this real opportunity to, to save time for doctors and ultimately hopefully, you know, save money and also save lives.
Host
So in an area like medicine, which is very high stakes, what advice do you have for developers that are trying to build trust with their product? Because as you know, part of the better than anybody that there's been kind of a road of broken promises of people who were very frustrated by, you know, oh, this is going to do this, that didn't work. And when you come in with something that says, hey, it really works, how do you win them over?
Caleb Hicks
Oh,
Zach Lipton
I think it's, it's, it's, it's a never ending. I think, I think trust is, trust is earned. Every single day, trust is earned. I mean there's, there's like the initial trust of like, we've been talking about this vision for a long time and then we actually built the product and got it to work. But there's also the trust that's built through, you know, working with hospital systems, kind of like a high touch, like white glove enterprise. And I think through continued delivery on everything from our product commitments, our data security commitments, the kind of service we give people, the continued fulfillment of every kind of promise and continue to expand and serve, not just initially, maybe more primary care, ambulatory, now emergency, inpatient, nursing stakeholders. I think this distrust kind of accrues through this continued delivery of everything from the product, through the experience of the medical teams that are working with us in partnership over the course of now we're talking about many years.
Host
Awesome. Well, thank you very much. Appreciate it, Zach. And it's a bridge. People want to find out more. It sounds like a very exciting space and to see where you guys are headed.
Zach Lipton
Yeah, thanks for having me.
Host
Enjoy the rest of Dev day. See, it's interesting to see where you have companies that are dealing with very high stakes stuff like medical or education. And a lot of it is the trust building. It's not a thing where you just pop out with a product and you say, hey, we're ready, we're Done. You can see from the examples here between Danny and Zach and how they have to basically and Caleb of basically just trying to sort of show the customers one step at a time, iterate on the product and improve it. And here to talk to us about tools for helping work on this is Lee Robinson from Cursor. How are you doing?
Lee Robinson
I'm doing well. Thanks for having me.
Host
Lee is great to have you here. So Cursor. I probably have about three cursor windows open right now on my computer that I'm thinking about right now. It has been an incredible product evolution for you all here. And just a little backstory. I was at OpenAI when we worked on the earliest version of Codex and Code K completions and I kind of naively thought that, oh well, I just asked GPT3, the 3.5 at this time or the Codex model or DaVinci code whatever and say just complete it and it's done. And I got my code. I'm like, that's it. Code is solved with AI and we're done. Turned out that's not the case.
Lee Robinson
Yeah, it turns out there's a lot that goes into it, especially from maybe simple text autocomplete, but to where we're at now with fully autonomous coding agents who can self correct and fix their own errors and pull in information from the outside world. And it's wild how much better the coding with AI space has gotten just in the past year, I would say.
Host
And it's a tool very much. What I appreciate is the fact that you guys are using Cursor to make Cursor better.
Lee Robinson
Definitely, yeah. One part of our culture that I think helps us produce a better product is not only of course, dogfooding how we built the Cursor by all of our engineering teams, all of our design teams, product management teams, using the product and using Cursor Agent to build the actual experience and for every part of their daily work. But then it allows us to give not only hard evals and you know, feels on how a model works, but just what it, what it feels like every day to use the product. You know, sometimes the feedback is this just didn't feel right. I don't know what it was with this model experience, but we need to tweak this a little bit here. And that is actually just as valuable as an eval that shows that maybe we need to tweak this number one little bit. So the vibes are important too.
Host
How do things work internally as far as adding feature? I imagine when you're a company filled with developers building a tool for developers. There's a lot of suggestions on what you should be doing, but how do you prioritize?
Lee Robinson
Yeah, I think across the entire company, anyone has the ability to contribute features and we kind of measure if a feature gets product market fit internally. So people make changes, they push them to main and they see if it gets adoption. They kind of push it out to company slack and say, hey, we built this feature, here's what it's good for. We'd love for you to try it out. We've even seen people incentivize with bagels and other pastries to get people to try out the feature and they get a lot of feedback internally on if it's good, whether they should change. We also have weekly demo sessions where people show off the cool things that they're working on and some features don't work out and we end up removing them and others really take off. People have been wanting this for a while. We build it internally. All of a sudden we look at metrics and we see how many internal devs are actually adopting the feature and what the churn rate is. And if it reaches a certain threshold, we have the conversation, maybe we should actually release this externally. So then we go through kind of a series of steps where it starts internal, that maybe rolls out to some of our ambassadors, rolls out to some of the people who opt into our nightly channel, which is like, you know, releases all the time. And then eventually there's the path towards getting it out to everybody.
Host
So one of the challenges in AI now is evaluating models and we can put something into one of the evals, one of the benches or whatever, but say, oh, it performs this well, but it's entirely different when you put it inside of an ide and you have to basically put a harness around it.
Zach Lipton
Right.
Host
So how much time do you all have to spend really evaluating the to really figure out if this is going to be a good fit or not?
Lee Robinson
Yeah, quite a bit. I mean, one of the things that, you know, we love working with OpenAI on is getting access to integrate the model early, work with your team on tweaking the prompts, getting the harness updated as newer model comes out. I know we spent a lot of time on this with GPT5 to actually we were able to delete some of our system prompt where as the models get better, you don't have to be as precise with the instructions that you're giving it. And the tools are getting better too. So we spend a lot of time. Not only dogfooding and just trying it for a bunch of tasks internally, but also having many of the engineers internally all try to build new features on the core product, experience with new models as they come out. So then you get a good range of maybe smaller tasks, but also real tricky engineering problems, like real gnarly bugs in a very large code base.
Host
Yeah, it's interesting in it a couple years ago, people sort of wondered what was going to happen in the model. So space was everything going to be using big foundational models, whatever. But you guys have sort of an all of the above. You're using, you can have, I can use if I want. I can choose variety, different models in there, I can use codecs, whatever inside of there. But also you guys like training your own models for the smaller. For like tab completions.
Lee Robinson
Yeah, it's been an interesting journey to watch. The first versions of Cursor were really focused just on code auto completion and it was predicting the next line or maybe the next action. And now we kind of moved from a world of using an off the shelf model to training our own model for autocomplete to now we do online reinforcement learning where we can actually roll updates every 30 minutes to the model, which is pretty amazing just based on the signals we get back from developers, whether they're accepting or rejecting changes that the autocomplete suggests. And that's been really helpful where you can have both these really focused and intentional models for very specific parts of tasks in coding. And, and then as you all make better models, these foundational models, we can integrate them and make that experience really great too.
Host
I'm seeing more and more of my friends who never even thought about coding. Start off with writing something maybe in canvas inside of ChatGPT and then all of a sudden go, oh, how would I deploy this? And then they go, try Cursor and try to do this. And how much have you seen the demographic of the people using your tools change?
Lee Robinson
Yeah, first and foremost because we we dog food cursor to build Cursor like we're always building for the professional software engineer. And we're trying to make that experience really great. But as a side effect of making the product easier and more accessible for engineers, it kind of welcomes in this whole new generation of people who maybe were on the fringes of coding or had tried coding in the past and now it's much more accessible for them to do that. So a lot more product managers, a lot more designers are supporting support team, uses Cursor quite a bit. And as more of these people have came in and started to use the product, it's allowed us to change actually some of the core features we're building. So one of the things that we're releasing right now is a new view of viewing the cursor experience where it's less like a traditional code editor and it's more about working with the agents. Because for a lot of people who haven't coded before you open up an IDE and you're like, what am I looking at?
Host
All this stuff, there's like all this
Lee Robinson
file tree on the left is very overwhelming for people who are just getting started versus opening up something that looks closer to a chatgpt. You're like, okay, I know this experience. I've got my agents on the left, I can type into my input text box. And so far we've seen this really resonate with that type of Persona where they're graduating into becoming a developer.
Host
I went from somebody who was first copy pasting before we even had chatgpt from the playground or whatever, building some simple little custom tools to then using IDE and like in case, in this case using cursor and using tab completions and now really just sitting down and saying, I just need to write a really good agents MD early description. Have you seen those patterns shift? You know, Yeah.
Lee Robinson
I think as people get more advanced with working with coding agents, they really start to realize that that high quality context in is going to give you much higher quality responses from the models. And we can build a lot of this into the core product itself. But some of it depends on your use case of what you're trying to build, the system requirements. And a lot of this stuff gets put into an agents MD file or gets put into newer features. We have like planning so you can actually work through a plan mode, go back and forth with the the agent to do some research on what you're trying to build and it can search your code base and kind of help paint a picture. And then when you're handing it off to the model to do coding, it's already got all of this amazing input context and the quality of the code generated is just significantly better at that point. So we've seen that be very helpful too.
Host
What would you advise for best practices for general?
Lee Robinson
Just adopting cursor starting there? Yeah, yeah. I think that for people who are comparing completely fresh to using cursor and we'll start with more of the professional engineer use case, then we can talk about others. But for Professional engineers, I think what we see for most people is their on ramp is they start using the editor just like they would traditional editors or IDEs where mostly they're using tab and they're doing their normal coding and they're still getting AI suggestions as they ride along and they're using agent to somewhat augment that experience. But they're not coding maybe agent four first. But I think slowly over time, as the models get better and the tools get better, they're graduating then into what would it look like if I can hand off more gnarly tasks to the agents and maybe run in the background or go refactor some part of the code base. So I usually recommend that progression for professional engineers. It's kind of the inverse for those who are not professional engineers because it's very hard for them to drop into code and then know, okay, this is is JavaScript. Oh, actually this is typescript and this is what a type is and this is what all these fancy words mean. It's much easier for them to take the other approach where they start with the agent view, they talk to the agent in natural language and the agent outputs code and then they can ask the agent, what does that mean? What is a const versus a let you know, what do these things mean? Or something in Python, for example. So generally that's what I recommend.
Host
Where do you think this is going to be headed? We're five years in to GPT 3. You know, the first time I mentioned earlier was that, you know, seeing somebody make a react button was exciting.
Lee Robinson
Yeah.
Host
And here we are now where you have an entire code base going through. They're doing this where's going to be five years from now.
Lee Robinson
There's still so much of software engineering, not coding, but software engineering, the professional job. That is a lot of mundane, repetitive tasks that engineers are not excited by. There's being on call and having to get go through the fire hose of data to figure out how to solve an issue. There's managing all of the incoming bugs that users are reporting and trying to figure out the right ones to work on and actually getting them addressed. There's so much more that goes into how the actual software gets packaged and delivered and shipped than just producing code. And I think the next, let's call it year to two years, hopefully there will be a lot better solutions across the industry for making that part of software engineering even easier. I imagine a world where you wake up in the morning and you're able to review code that's already been tested Generated, you had customers report issues and they actually got fixed overnight. You could just review the output. Yes, this looks okay. It passes all of our tests except in Merge and a world where maybe on call isn't as painful and maybe code reviews are actually fun to use, I think we can get there faster than what people might expect. And I think both in exploring things, just working with models directly and then the products around them getting better, what
Host
surprised you the most? What has been your big aha moment
Lee Robinson
in working with Cursor?
Host
Cursor, Anything.
Lee Robinson
Yeah. I think for me it is incredible how when you're in the San Francisco tech world, it seems like everyone's coding with AI, right? Like everyone's using these tools and then you talk to people around the world, world and you realize we're still so early. We are so early. I mean five years in from that first react component generated and for a lot of people this is still, you know, this year is the first time they're really coding with AI. So for the long tail of the industry and then the next, however many million of people who become developers, it's never been a better time to actually start learning how to code in general, but then also how to use AI to write code and review code.
Host
I was at a, at a call college campus. Barry won't name the campus and this was last semester and I asked the students in the CS program, what are they teaching you about agentic coding? What are they thinking about AI code completion?
Lee Robinson
I'm sure nothing. Yeah, nothing.
Host
Nothing. Not even, not even a one day class on it. And it was weird because like I get learn the fundamentals. If you understand, you know, a deeper understanding of Python, these other languages, you're going to really be excellent this stuff.
Lee Robinson
Absolutely.
Host
But there was none of that.
Lee Robinson
Yeah, the education system is going to have to change a bit specifically around coding because it's evolving so fast, it's being speed run and you 100% need to learn the foundations of CS. That's how you're going to have a successful career as a software engineer. But you need to know the new tools that are changing at this point every few months they're just getting better and better and better. So this is one of the things I work on full time at Cursor is teaching developers and putting out educational material. So we just put out a course teaching devs and hopefully new devs, a lot of these fundamentals. What, what is context? What are tokens? What's, what's a context window like for a lot of people this is, you know, gibberish. They don't know what this means yet.
Host
Yeah, it's, it's still. It took me as an aside, though, because it was just thinking about, you know, students coming out there are going to want to go work in companies, and the companies that are moving really fast, you know, are the ones where you understand that. And you know, my friends who were brilliant computer science TIs, who understand it really deeply, who are using these things, get the best of both worlds. They understand deeply how this stuff works, but they also understand how to apply this. Yeah. So, you know, whatever we can do to encourage more of that, I think in education would be great.
Lee Robinson
Well, just like how ChatGPT has the learn mode built in, I would love. This is a side project maybe, but I really want to build this into cursor.
Host
That'd be amazing.
Lee Robinson
Like, I want engineers, especially those just getting started, maybe a freshman in college, to be able to use the tool and learn as they're building. Because for me personally, I think for most people who write code, applied application use of coding is actually how I learned the best. Like, I can't just read all of the CS textbooks. I actually need to build something with the data structures or algorithms. It's like, oh, then it clicks. So if we can build more of that into the core product experience, I think that would be really helpful.
Host
Yeah. Two of my close friends who were never coders, their first introduction after playing around chat GPT was to go into cursor and to learn inside of there and also understand that, you know, once you're even using it, as is right now, having something that can explain to you what a server does, how to run it locally and do that is amazing lift. And it seems like that would be kind of a really, you know, great addition to it. Do you feel like we're going to see kind of the vibe coding error sort of just evolve into where we. We just call it coding?
Lee Robinson
Yeah, probably. I think one of the things with vibe coding that's interesting is certain words just catch fire because they put a phrase to something people have been trying to explain for a while. And to me, vibe coding is this idea of it's never been easier to create prototypes. You can just try out ideas and put it into ChatGPT and get a canvas and see what it looks like, or open up cursor and ask it to build an idea. And that doesn't necessarily mean that you have to ship that code because there is a long tail of software engineering where it's kind of like the iceberg memes, where it's, like, on the top, it's vibe coding. And then you're like, oh, wait. There's actually a lot more to building, delivering complicated software. Like we saw today with Agent Kit. There's a lot that goes into building successful, reliable, observable agents. So I love that it's kind of opening up the top of the funnel and making software creation accessible to more people. And then as they go down, they realize, okay, there's more complexity here. And I think over time, that just evolves into this new world of software that will probably look quite a bit different than the past five years.
Host
Yeah. As a writer, we use term seat of the panther was somebody who would just sit down in front of the typewriter and just go and tell it. Maybe find a story, or if they're a famous horror author, have to kill everybody off halfway into it and then come to an ending. Or, you know, planners, you know, And I think that sometimes seated panting is a great way to explore and find out new things. I never would have tried that. And did it. And then when you sit down and say, no, I know what. I know what to do. I'm going to go proceed this way. This has been awesome. Thank you so much for joining us, Lee. And people can find, you know, cursor everywhere. It's pretty awesome. Highly recommend people play with this. And it's been just, like I said, it's like my favorite thing to do is just open up some tabs and just throw some crazy thing in and see if I can make it.
Lee Robinson
Yeah.
Host
Yeah.
Lee Robinson
Thank you.
Host
Have a great Dev day.
Lee Robinson
Yeah.
Host
Thank you, everybody, for joining us.
Live from OpenAI’s DevDay, host Andrew Mayne brings together leaders from four companies using AI to transform their industries: education (Caleb Hicks, SchoolAI), web development (Danny Grant, Jam.dev), healthcare (Zach Lipton, Abridge), and software tooling (Lee Robinson, Cursor). The episode is a fast-paced, insightful exploration of how OpenAI’s latest features—especially the Agent SDK and new developer tools—are empowering teams to build advanced, user-focused applications, and how these innovations are changing what’s possible across sectors.
Progression of Attitudes: Years ago, AI was broadly banned; now, educators are moving towards seeing AI as essential for productivity and student learning.
“Two and a half years ago, it was everyone under the sun was just banning AI altogether. ... Now, most people orient to ‘yeah, we have to teach this.’”
— Caleb Hicks [02:57]
AI as an Individual Tutor: SchoolAI focuses on safe, managed AI tutors accessible for every student, with orchestration for classroom integration and real-time dashboards for teachers.
“The special part is when you start doing these one-time, guardrailed, safe managed AI tutors ... students are interacting and the teacher gets a real time dashboard of how the students are doing.”
— Caleb Hicks [05:32]
"Being able to drag and drop something like file search, the permission structure seemed really well thought out, particularly for the classroom."
— Host [09:00]
“Teachers have to make this impossible choice. … We think about what we've been able to build with OpenAI is give teachers almost a GPS for impact.”
— Caleb Hicks [06:56]
“Take 10 and do the evals and that'll be fun.”
— Caleb Hicks [11:53]
“Just start in the GPT builder and then expand ... Agent Builder that we saw today is going to be another fun one to start connecting the dots.”
— Caleb Hicks [13:28]
Please Fix Tool: Lets anyone—non-coders included—fix website issues via a browser extension, editing sites like a Google Doc or Figma and auto-generating clean pull requests.
“It helps anyone fix what's broken instantly without writing code ... it lets you edit your site right there like it's a Google Doc.”
— Danny Grant [14:25]
OpenAI’s Impact:
“I think we just saw a whole new way of experiencing the web ... a lot less mechanical and a lot more stream of consciousness.”
— Danny Grant [15:41]
“We are about to see the Cambrian Explosion of software ... it's one of the best things for humanity.”
— Danny Grant [18:06]
Continuous User-Focused Improvement:
“The difference between a fine designed product and a well designed product is that the well designed product changes the world.”
— Danny Grant [16:39]
Internal Tools Becoming Products:
“Three out of four ... their startup started as an internal tool at their company that they needed.”
— Danny Grant [21:20]
“What if the agents could improve themselves? If we had that at Jam, we could move a lot faster.”
— Danny Grant [23:40]
“It was a situation where technology was pulling doctors away from patients rather than bringing them closer.”
— Zach Lipton [26:38]
“We see doctors saving as much as an hour or more a day ... I actually got to have dinner with my family every night this week for the first time in 10 years ... Abridge saved my marriage.”
— Zach Lipton [27:28]
Agent Kit & Developer Tools:
“Seeing OpenAI take a strong position and put a kind of comprehensive offering that brings together a lot of these things... allows us to focus more on the content.”
— Zach Lipton [29:34]
AI in Developer Tooling:
Defining Error Acceptability:
“In the context of medical note taking... Even if some of the information might be correct or plausible, that's not within ours.”
— Zach Lipton [31:56]
Custom Eval Pipelines:
“We create our own special purpose models that are able to... process for each one: does it contain an error of an unacceptable variety, like of what kind? We can do that with about 97% recall at this point.”
— Zach Lipton [33:21]
Advice:
“The central thesis wasn't just about scribing. Central thesis was about medical conversations ... it's the most important moment in the entire experience of healthcare.”
— Zach Lipton [35:10]
“Trust is, trust is earned. Every single day, trust is earned.”
— Zach Lipton [42:55]
“Turns out there's a lot that goes into it, especially from maybe simple text autocomplete, but to where we're at now with fully autonomous coding agents who can self correct.”
— Lee Robinson [45:20]
“We do online reinforcement learning where we can actually roll updates every 30 minutes to the model ... whether they're accepting or rejecting changes that the autocomplete suggests.”
— Lee Robinson [49:11]
“A lot more product managers, a lot more designers are supporting support team, uses Cursor quite a bit. … So, we've seen this really resonate with that type of persona.”
— Lee Robinson [50:12–51:07]
“I really want to build this into cursor. ... To be able to use the tool and learn as they're building.”
— Lee Robinson [57:58]
“There’s still so much of software engineering ... a lot of mundane, repetitive tasks that engineers are not excited by. ... I imagine a world where you wake up in the morning and you’re able to review code that’s already been tested and generated.”
— Lee Robinson [54:18]
“I asked the students in the CS program, what are they teaching you about agentic coding? ... Nothing. Not even a one day class on it.”
— Host & Lee Robinson [56:29]
On AI's Impact for Educators:
“If you're graduating from high school and you're competing for colleges or jobs and you don't know how to use AI yourself, you're at a severe disadvantage."
— Caleb Hicks [03:13]
On the New Web Experience:
"Web one is read, Web two is read-write... maybe this is like, web four: read, write, think. I think we just saw a whole new way of experiencing the web."
— Danny Grant [15:41]
On Trust in High-Stakes AI:
“Trust is earned every single day.”
— Zach Lipton [42:55]
Describing Vibe Coding:
"Vibe coding is this idea of it's never been easier to create prototypes. ... You can just try out ideas ... That doesn't necessarily mean that you have to ship that code ... There's actually a lot more to building, delivering complicated software."
— Lee Robinson [58:55]
On Non-Coders Creating Value:
"Some of the stories we hear are awesome ... building software for firefighters, for churches, with no software experience ... We're about to see the Cambrian Explosion of software."
— Danny Grant [18:06]
Across education, web development, healthcare, and software tooling, OpenAI DevDay’s tools and SDKs are accelerating product cycles, lowering technical barriers, and empowering users of all backgrounds. Guests emphasized the importance of deep domain understanding, fast iteration, robust evaluation, and, above all, trust—especially in high-stakes contexts. The future, as seen here, is one where AI becomes an intuitive partner for creation, problem-solving, and learning.
For more on the guests:
Social handles:
Note: Timestamps reference the MM:SS position in the original episode transcript.