Summary7 min read

Just Now Possible with Teresa Torres

Episode: Building Lorikeet: How AI Humility and a Dual-Agent Architecture Are Redefining Customer Support

Date: May 28, 2026
Guests: Jamie Hall (Co-founder/CTO), Charmaine (Product Engineer), Rona (Product Engineer)

Episode Overview

In this episode, Teresa Torres welcomes three members of Lorikeet’s engineering team to explore how Lorikeet is redefining AI-powered customer support. The discussion covers the company’s origin story, their iterative approach to product design, the challenges of applying AI to complex support scenarios, and their innovative dual-agent architecture. The conversation offers founder-level candor around what worked, what failed, and the unique product choices Lorikeet made to stand out in a crowded AI customer support space.

Key Discussion Points & Insights

Lorikeet’s Mission and Product (01:24–02:47)

Lorikeet builds an AI customer support concierge aiming to provide highly contextual, effective, and genuinely helpful support—more like a real human concierge than a traditional chatbot.
Unique Value: Lorikeet can directly resolve issues (not just answer queries) by integrating with the systems needed for action, not just replying with knowledge-base excerpts.
- Jamie Hall: “Imagine what it would be like if your bank answered the phone on the first ring and said, ‘Hey, Teresa, are you calling about this flagged transaction that just got blocked?’ ... That’s the kind of future that we’re driving towards.” (02:02)

The Startup Journey: Trials, Failures, and Pivoting (02:57–07:14)

Origins: Lorikeet’s founders noticed that operational and support teams were critical limiting factors in other businesses’ growth, yet underserved by tech.
Failed Ideas: Initially built tools for reflection and workflow tracking, found that while people liked them, ops teams were just too busy to stick with additional tasks.
- Jamie Hall: “We had to push really hard for implicit indirect signals that all of our ideas sucked, that we needed to, you know, figure out something that was actually connecting with a real problem.” (03:55)
Breakthrough: While embedded with a healthcare support team, discovered the burning pain was simply “we’re drowning in tickets—just help us clear the inbox.”
Validation Tactic: Early design partnerships where they asked for “an uncomfortably large amount of money” to ensure pain was acute enough to warrant a real solution. (15:01)
Lesson: Only when solving a deep, daily pain would customers adopt imperfect prototypes (such as command-line spreadsheet systems) as real productivity tools.

Product Evolution: Beyond FAQs to True Workflow Automation (07:56–12:53)

Initial Scope: Started with email (easier latency/timing), integrating into support team workflows via CSV uploads processed through a CLI and returned as spreadsheet outputs.
Deep User Research: Instead of generic “knowledge base” bots, recognized that true support work demands cross-referencing multiple data sources, taking actions (refunds, updates, investigations), and involves nuance and emotion.
- Jamie Hall: “As we looked at what people were actually spending their time on… They weren’t summarizing FAQs … it’s the sort of work where you spend all day with five browser windows open...” (08:36)
Strong AI/ML Process: “There’s really no substitute for … reading conversations and [asking] what was good, what was bad about this? Now do it 50 more times…” (09:12)
AI’s Real “Moat”: Not in proprietary tech, but in relentless iteration, failure analysis, and tight alignment with real user work.

Human-AI Collaboration: The Shape of Modern Support (19:28–23:54)

AI Humility: Lorikeet’s approach is to only automate what’s reliably automatable, escalating nuanced or high-risk tickets to humans.
- Jamie Hall: “We want to have a degree of AI humility… genuinely hard or very, very complicated [tickets] … getting like delegating them to an AI is not the best outcome for a customer.” (22:18)
Uplifting Human Work: As Lorikeet handles routine tickets, human agents spend more time on “hard” support queries, improving overall quality.
Product Evolution: Integrations for ticketing systems, deeper evaluation/diagnosis tools, and the shift into “coach” (coding-agent style) and “concierge” agents.

Integration & Product Footprint (24:10–25:44)

Plug-in Approach: Lorikeet overlays onto existing ticketing systems like Zendesk or Intercom, acting as “just another agent” and minimizing friction.
- Rona: “We’re not trying to disrupt [ticketing platforms]… we’re trying to bring [Lorikeet] as a little Persona, AI agent into the mix...” (24:51)
Competitive Edge: Focuses on AI quality and collaboration, not replacing robust, operator-facing ticketing UIs.

Dual-Agent Architecture: Coach and Concierge (29:32–37:57)

Concierge Agent: Handles customer interactions, responds within existing email/chat workflows, and takes limited actions based on configuration.
Coach Agent: Works with the client (support team managers) to configure, train, troubleshoot, and improve the concierge agent using a conversational, coding-agent paradigm.
- Jamie Hall: “…the way we do config and instruction and feedback now is… something like a coding agent... that’s happening inside our product with this thing that we call Coach.” (29:39)
Config UX Evolution:
- From: Traditional workflow builders with text boxes for prompts.
- To: Conversational interface, “as if you were training a new employee—just tell Coach what you want.”
- New Direction: Defining outcomes (good/bad test cases) first, then having Coach synthesize or update SOPs and prompts based on those.

Evaluation & Guardrails: Customer-Defined Quality (34:43–62:33)

Evals/Testing in the Loop: Lorikeet pushes customers to define scenarios and outcomes (“test cases”) first—not just dump procedures or prompts—then designs workflows around those.
- Charmaine: “Let’s define what the test cases are… then, hey, Coach, now can you build the SOP based on that?” (34:07)
Guardrails: Highly customizable. Default checks (e.g., hostile language) often need tuning for domain specifics (e.g., “weed” for legal cannabis businesses).
- Jamie Hall: “Guardrails lets our users prove a negative, which is otherwise very difficult… with normal language models and evals… you can try your best… Having this cross cutting thing means you can guarantee that if that happens, it’ll be detected…” (57:25)
Failure Analysis: The system traces and diagnoses agent failures, with Coach agent surfacing issues and suggesting fixes, reducing manual error analysis.

Human-AI Collaboration Modes (41:33–47:22)

Resolution in the Loop: The agent handles as much as possible but, when stuck, can proactively query a human for specific unknowns before resuming automated handling.
- Rona: “AI is just shy of getting to that answer, but just can’t quite answer it… the human comes in and answers… then we go and focus on feeding that back into the knowledge base again.” (41:33)
UX Challenge: Striking the right balance between chat-, button-, and workflow-driven configuration. Some features (e.g., simulation testing, scenario generation) blend classic UI with conversational elements to reduce user friction.

The Importance of End-to-End Product Thinking (63:29–67:55)

Product Culture: Lorikeet encourages every engineer to think like product managers; regular sharing from support/customer conversations is expected.
- Rona: “Every week… Jamie asks everyone… what is one thing that you learned from a subscriber... we’re really encouraged to stay in tune with [users] and also perform those UXR discoveries.” (63:46)
Agile Launch Process: Alpha → Beta → Launch, with intentional check-ins at each stage to avoid unnecessary “slop” in the codebase.
Team Structure: 70 employees, a handful of PMs, a few (overworked) designers, and product-minded engineers.

Notable Quotes & Memorable Moments

Jamie Hall (on AI humility):
“There’ll always be a residual of really complicated tickets that we don’t want to touch.” (22:18)
Charmaine (on solving real problems):
“They just started complaining to us about this software they’re forced to use. They hated it. It was our software... That was my origin story.” (17:13)
Teresa Torres (on the real AI moat):
“The first 60% is pretty easy and then the last 40% is like the rest of our lives hard.” (18:14)
Jamie Hall (on prompt suggestions):
“Users don’t care about your org chart... when you pop open a customer support chatbot … with these suggestions ... I’m there thinking, I don’t know, I’ve got a question about this thing that happened… Humans shouldn’t have to care about your org chart.” (47:22)
Charmaine (on customizable guardrails):
“We learned early on that guardrails are quite dependent to the domain that business is in.” (56:43)
Jamie Hall (on product):
“It is very hard to turn billions of dollars into quality software. That’s the essence of our moat. … This is a really hard kind of job to pick the right thing to build and the right user problem to solve.” (67:26)

Timeline of Key Segments

01:24 – 04:15: Lorikeet’s origin & “mom test” failures
07:56 – 12:53: First prototypes & shifting to true workflow support
19:28 – 22:18: The need for AI humility; letting humans handle edge cases
24:10 – 26:24: Integrating with ticketing systems, not replacing them
29:32 – 37:57: Introduction of coach & concierge agents; configuration UX
41:33 – 47:22: Human-AI “resolution in the loop” and collaboration strategies
54:00 – 61:03: How Lorikeet handles LLM overconfidence, guardrails, and trace analysis
63:29 – 67:55: Product-minded engineering and agile launch practices

Conclusion

This episode delivers one of the most detailed, practical, and honest looks at building an AI-first customer support product—emphasizing the “unsexy” but essential work of true product discovery, the value of humility in AI system design, and why UX and continuous adaptation are the true sources of AI moats. Lorikeet stands out for its focus on configurability, customer-driven evals and guardrails, and its vision of AI agents as collaborative teammates—not just replacement workers.

If you’re building with AI, these are the stories for you.

Loading summary

Transcript158 lines

[00:04]
A
Welcome to Just Now Possible with Teresa Torres.
[00:09]
B
My name is Jamie Hall. I'm a co founder and the cto. And before this I was an AI research engineer at Google Brain.
[00:18]
A
Excellent.
[00:21]
C
Hey, I'm Charmaine. I'm a product engineer here at Lorikeet. My favorite part about technology is where humans interact with technology, especially emerging technology these days. That's generative AI. I'm excited to be here at Lorikeet, learning a lot about how that works and how humans and tech collaborate on that.
[00:42]
A
Yeah, Charmaine, we're going to get along just fine. That's one of my favorite things as well. Rona.
[00:49]
D
Hi, I'm Rona. I'm also a product engineer at Lorikeet. My day to day kind of looks like working in a lot of the big essentials that go around of AI. So kind of going into the how does human interact with AI through things like external integrations and whatnot and configuration.
[01:08]
A
Amazing. This might be the first time I have three engineers on the call, so I'm excited to dig in. Okay, so tell me a little bit about what does Lorikeet do and maybe introduce the product that we're going to dive into. And maybe those are one of the same.
[01:25]
B
Great. We do a AI customer support concierge. So the idea is that it's of the best customer support you've ever had. A concierge that knows you, that is helpful for you, and that can actually get stuff fixed for you instead of just fobbing you off with generic answers or that sort of thing. So what we're building towards is a state where imagine what it would be like if your bank answered the phone on the first ring and said, hey, Teresa, are you calling about this flag transaction that just got blocked? Let me look into that for you, or something like that. So that's the. That's the kind of future that we're driving towards.
[02:02]
A
Yeah. Okay, I will share that. We've had, I think, three different teams on this podcast in the customer support space, which makes a lot of sense. It's like a huge people problem. It's a huge volume problem. We have had one other team. I think it was Gradient Labs, that actually is also focused on finance and regulated space. I think from reading your application, one thing that stood out to me, Gradient Labs is one of the more sophisticated teams that we've actually had on the podcast in terms of AI maturity. And I see a lot of parallels in what you're doing. I think one of the things that stood out to me is the scale at which you're doing voice and I think that that is very unique. Okay. Lorikey is a startup, is that true?
[02:48]
B
That's right.
[02:48]
A
Tell me a little bit about the life cycle of the company. Like what, where did the seed. Jamie, you mentioned you're a co founder, so tell me a little bit about the origin story.
[02:57]
B
Yeah, we started with a general problem area which was ops and support teams in general. So this is. And a half odd years ago we had this intuition that there was an overlooked unsexy part of the tech sector. Nobody likes building tools for ops teams, but they are often a limiting factor on companies growth and it's the sort of thing where once you start thinking about it, you notice it everywhere. Like just it's the main way that people have contact with companies a lot of the time it's the engine room of what so many companies are doing. And we started, my co founder and I started building tools to make ops teams more efficient in various ways or to understand their productivity and help them understand what they're working on. And he spent a good few months wandering in the wilderness with terrible ideas, which was surprising to me because I like, I expected that everybody would be telling us no, you're not allowed to do that or something like that and we would have to be boldly forging ahead with our founder vision or something. But in fact everybody is so supportive and so positive that we were just bathed in positive feedback all the time and we had to push for really hard for implicit indirect signals that all of our ideas sucked, that we needed to, you know, figure out something that was actually connecting with a real problem that people had.
[04:15]
A
This is the part of the mom test, if people are familiar with that book, which is really just teaching founders how to ask questions in a way that actually gets you that important critical feedback. Yeah. Okay, tell me a little bit about what are some of the things that you tried that just didn't work out before you kind of landed on where you are today? If you're open to sharing?
[04:34]
B
Yeah, no, of course I can share some of our bad ideas so that people know not to follow that pathway again. The one sort of broad area that we looked at was helping people through coaching and reflecting on what they were working on. Because a pretty common pattern with ops and support teams is that they, they grind really hard and they get stuck in just like we will smash through by banging our head against it through determination and grit. And if you just sort of walk around it then you can get the same effect. And so helping people kind of Manage themselves through that kind of thing. Which it turned out people liked that and they found it helpful, you know, as they interacted with our prototypes. But the core thing is they're really busy people and they've got 12 things to do today and this was the 13th. And so they would use it for a couple of days and then not use it again.
[05:35]
A
What makes you feel better? I think I've had the almost exact experience with a similar startup idea. I'm really into reflection. Right. And there's this big study that came out that if people just spend 10 minutes a day reflecting on how their day went, they get much better at their work. And I was like, oh, this would be like a really simple service just to like email people every day and have them, like fill out a journal. And then you could do all these look backs, which, by the way, I do this with cloud code now, but I like built this service and people love the idea of it and then they don't stick with the habit of it at all. And so it was like a very flash in the pan idea. But it sounds like there's a lot of overlap there.
[06:14]
B
Yeah, very similar line of thinking for us. And the other category that we explored was information flow. A very common problem for ops and support leaders is what is my team working on? Like, I have no idea why they're so busy. And so we looked at various ways to sort of figure that out without being too imposing on people's kind of workday. And again, it was the sort of thing where it was like, it was kind of nice and it was interesting and people were curious, but it wasn't solving a burning problem for anybody. And so as we spent a few months embedded with a support team at a healthcare scale up here in Sydney. And so as you ask people, you know, hey, what would be a burning problem that we could solve? This sort of shape of answer is you could answer some of these support tickets and help us clear out the inbox. And so we started building a tool that could do that.
[07:08]
A
Yeah, it turns out when a function is drowning in work, maybe the best way to help them is just do some of the work for them.
[07:15]
B
Right?
[07:16]
A
Yeah. Okay, this is great. So, you know, what I love about this story so far is you started with a target market, you started with a target customer. You turn through a lot of ideas, which is so normal and so expected, even if founders don't always know that before they jump in. And then you started to let your target customer pull you. You almost followed the demand. So You've identified this customer problem of I'm drowning in support tickets, just help me do some of the work. How did you evaluate that this was a problem AI was well suited for? And I know Jamie, you have sounds like a pretty deep AI background, so maybe do you want to tackle this one?
[07:57]
B
Yeah, once you start thinking about it, it's a pretty obvious application of large language models as they were in 2023. And definitely it's not an original kind of idea for us to jump into this problem space. What we felt would differentiate it was the ability to actually directly help people as opposed to implement some tech. And what I mean by that is the companies at were largely specialized in rag and sort of question answering and getting started quickly with a question answering helpbot. And for some companies that's great, that's all they need. Just answer people's questions and on they go. Right. But as we looked at what people were actually spending their time on in a support team and as we chatted with them and stuff, they weren't summarizing FAQs or reading docs back to people. It's the sort of work where you spend all day with five browser windows open and you're kind of cross referencing between internal reference docs, the admin page, texting somebody to check a logistics order, pasting information back into the ticketing system. And that sort of shape of work was not kind of within reach yet at the time. And so we sort of jumped in, we felt that we could start with that problem and then come backwards to what sort of tech would we need to solve it. And I think that was really a strong lesson that I took away from the AI research days at Google Brain was just how there's really no substitute for kind of jumping in and thinking about the problem itself. Like people would come in to want to join foundational language, wanting to work on amazing new architectures or like, oh, I want to tinker with the, the underlying model architecture or something. But the actual day to day work is just reading conversations and what was good, what was bad about this? Now do it 50 more times, now do it a hundred times and then like, you know, step back and you know, doing the work sort of focus is, is, is how we started and I think it's pretty deep in the DNA of the company.
[10:09]
A
Yeah, I love that you're highlighting this. I think in month three of building my first product I came across Hamill Hussain and Shreya Shanker's AI evals course. And Hamill has this mantra of look at your data. He's just a data scientist. Look at your data. He says it a million times. And I think what you're describing, Jamie, is like, you can't really build an AI product, period. Not even a good one. You can't build one if you're not looking at what is it doing? And it's. We tend to, like. I kind of giggle now when I hear people ask all these questions about what tools are you using and what. What have you tried the latest model? And I'm like, I literally spent all day looking at 50 traces and digging in. That's the work, Right? Okay, so you found this customer problem. We want to help them do support tickets. It sounds like this was two. Do you say two and a half years ago?
[11:01]
B
Yeah, that's right.
[11:02]
A
Okay. So pretty early days. And it sounds like one of the things that you really uncovered is it's not as simple as, here's a knowledge base. Go look up the answer and reply. There's more complexity, and this resonates with me. Like, I have a small business and even my customer support tickets, an email comes in. We have to look them up on our student database. We have to look them up in stripe. We have to write whatever before we can even answer a simple question. So let's talk about, like, day one. What was your earliest prototype? How did you start to evaluate. This is something we could help with.
[11:37]
B
We started with. We started with emails because the. Well, for one thing, the latency pressure is a lot less. So if you like, an email reply in, like, less than an hour is considered excellent or was considered excellent in the days of human support teams. And that sort of choice of channel made it a pretty easy place to get started and figure out how we could get all of the plumbing connected to actually to actually start resolving people's issues instead of just fobbing them off. And that's about when Charmaine joined us to come and get that first machinery going and actually email and live chat with people following soon after. And I think at that time, there was a lot of just experimentation with different ideas for how best to tackle this. We initially started, because of the limitation of LLMs at the time, with a real sort of flowchart focus, but basically to limit the blast radius of what an individual LLM was able to get wrong or go off the rails with and then have a conversational flavor to it over the top. But, yeah, that took a lot of blood, sweat and tears to get that working in the early days, I remember.
[12:54]
C
So email, we had a Command line interface and spreadsheets. That was our first prototype. Literally that first customer we worked with at Healthcare Health Tech Space, they would send us emails in CSV and we process it through our little agent through a command line interface. We'd send them the results back and we'd chat over email or slack to these customer support teams to check. Was that good? Okay. And then we iterate. That's how we iterated in the early days.
[13:25]
A
You know what I love about that's like your earliest like eval data set, but with real customer data. That's amazing. And Charmaine, so you joined pretty early on, it sounds like. Tell me a little bit about your background. Did you have experience working on AI products? Was this new to you?
[13:43]
C
So before Generative AI, I was actually at a company where they were building just old school, now machine learning. So I was working with a company doing computer vision type stuff for evaluating big assets like oil rigs to see, hey, this oil rig, these pipes are in trouble, go fix them. So I was working with that team. They were deep in. I was on the team that made the AI teams stuff work for the users. Okay, that makes sense. Yeah, yeah, I've had that kind of experience with AI, as in making it work for users. And after that I spent some time at Dovetail around the time Generative AI was picking up. So Dovetail customer Insights platform using. They were starting to see how Generative AI could help go through transcripts of customer interviews and come up with Insights.
[14:38]
A
This is a problem. I know. Well, yeah, okay. Yeah, excellent. Okay. So I love this early prototype of command line interface spreadsheets. You've been mediating with a customer, which is amazing. How did you find that first customer and how was that like, how did you get them to agree that this was a product that they should be using?
[15:02]
B
So we'd been working with them for a while. They were an early design partner for our terrible prototypes about ops teams tooling. And they were an easy sale because we could see the pain that they were dealing with every single day. And so would you like this pain to go away? Was a pretty easy sales pitch. And we had started getting by getting market validation through design partnerships where we would deliberately ask people for an uncomfortably large amount of money to validate that our ideas were actually valuable to them. And the conversations would go on and there'd be a bit of back and forth and thinking and stuff. As soon as we started exploring this area, the tenor of those conversations changed immediately. If you say to people, would you like your Support experience to be better, faster and cheaper than they say. Yes. And so we. We focused in on building this product as. As quickly as we could rather than getting the sales validation anymore.
[15:57]
A
Yeah, this conversation is turning into such a gold mine for startup founders of like, how to. Partly why this spreadsheet plus command line interface jumps out at me so much is it's such a clear indicator you found the right problem to solve. Like, when you find a problem that's this painful for your customer, they will put up with the command line interface in a spreadsheet because it's actually helping them already. And so this is amazing.
[16:22]
B
And yeah, that was really at the core of our thinking for finding the core problem that people might find valuable. And Charmaine has a great story from her previous work on product engineering that radicalized her about actually solving real people's problems. Or, sorry, correction, actually solving real problems for people.
[16:39]
A
Yeah, Charmaine, you're going to have to tell us the story now.
[16:42]
C
I guess the main origin story was when I was back when I was a baby engineer and I was just having fun learning about all these cool tech things. Microservices. What? Anyway, two years into this job building this thing, we went to a Christmas party, big department wide Christmas party. We met these people from a different team and they just started complaining to us about this software that they're forced to use. They hated it. It was our software.
[17:13]
A
Oh,
[17:16]
C
that was my origin story. That made me realize, actually, yeah, the tech stuff is fun, but what even are we building? Is it actually useful for our customers?
[17:27]
A
Yeah, that little sound bite is the best. I might have to steal that as an argument for why engineers should be involved in discovery. Right. Like, when you hear it firsthand, it just has. It lands a little differently.
[17:41]
C
Yeah.
[17:42]
B
I love that story so much. And it really harmonizes with how we think about AI product building, which is like, it's not magic. You know, people, like investors sometimes ask us, oh, do you have a moat? Like, expecting us to say, oh, yeah. Inside this safe, we've got the secret formula for how to do AI, which is like nobody else has, but, like, it's just product. It's just a different way of building. It's just a new way of building product and software and all of the same principles of finding people's real, real pain and helping them with it. All of those principles apply.
[18:15]
A
Yeah. This is something that. Okay, so Charmaine, you shared, you worked at Dovetail that was starting to get into interview insights. One of the products that I build is AI generated Interview Snapshots. So taking a transcript and really extracting opportunities and then generate, using AI generated opportunity solution trees. And I'm like three months into working on these problems and I spend like an inordinate, a number of hours working on like a real single failure mode, just trying to bring the error rate down. And then there's four, like 20 of those and I'm like. And people ask me all the time, can I just throw my transcripts into NotebookLM? And I'm like, you can, but if you look at it like it's more than a full time job to make it good and that's the moat, right? Like it's not the magic of the AI, it's literally the data science of what's going wrong, how do we fix it, how do we iterate on this? And I think people don't quite understand that complexity until they get their hands dirty and get into it and realize, well, the first 60% is pretty easy and then the last 40% is like the rest of our lives hard.
[19:29]
C
And that's what really fascinates me about the space of AI. Because joining LorieKEET we learned really early on that yeah, that is true, AI can't get you 100% of the way. But that remaining percent, how do you make that good? As in how do you design systems so that humans can easily work with AI to get that extra 20, 40?
[19:52]
A
Yes, absolutely. Okay, we're going to dig into all of this. I want to get Rhona into the conversation though. Rona, do you want to share a little bit about your background and like when you joined in on the Lorikeet journey?
[20:03]
D
Yeah, sure. I joined a little bit later, which is I think around a year into when Lorikeet kind of started up. So at that point I think like we've already got a few kind of subscribers, we call subscribers our clients right on board. And so that a lot of the challenges there was scaling that upwards.
[20:22]
C
Right.
[20:23]
D
And so we did a lot of involvement in how do you make the data that were in spreadsheets and that were like accessible through CLI into a more digestible state that's much more visible and that people can go and like self service a little bit more. I think that's when we like really dug into the self service part. And one of the first projects that I worked on was with actually with Charmaine, which is actually kind of digging into, step by step diagnosing what actually happened within a ticket with a customer. So really bringing those useful points. Right. Because that's Actually a part of the problem which is how do you go and curate and diagnose and identify the really important parts to go and surface not just everything in one go, just dump it in front of the human agent to go solve. So that was one of the biggest projects at the beginning that I worked on.
[21:12]
A
Yeah. So let's get into the evolution of this product a little bit and then we'll fast forward to where you are today. I know from talking to other teams that are working on a similar space and especially with the Gradient Labs team. And you alluded to this, Jamie, I think in your intro, isn't this like easy bucket, have a rag step, build a knowledge base, answer the easy questions. And then I think most teams have shared the next step is like you very quickly have to let the agent take actions. Like a lot of support tickets can't be solved unless you can refund something or you can change your order, whatever the case may be. They have to be able to take an action and that's often like a big next step. And then I think there's this next layer of like, things might go wrong when to escalate to a human, you might need another agent to weigh in. Is this resonating with your journey? Are there? It sounds like also in your journey is a voice step. So I want to hear about that too. Can you maybe just highlight what you feel like the big evolutionary milestones were and then we'll dig into how it works today?
[22:18]
B
Yeah, I think we tackled all three of those things at the same time from the start. As in, we've always thought that we want to have a degree of AI humility about what we're doing. As in, a lot of these support tickets coming in are genuinely hard or very, very complicated or, you know, involve a lot of human emotion or whatever. And getting like delegating them to an AI is not the best outcome for a customer. And so we think there'll always be a residual of kind of really complicated tickets that we don't want to touch. And in fact we found with our fully scaled companies that are at altitude now, average handle time of their human operators has actually gone up. So they're spending more time on customer support tickets. And the reason is like a very good one, which is they are devoting the time to the really hard tickets instead of having to try to get through them as quickly as possible. So they can slam through this inbox that's full of routine stuff as well. We've evolved. It's been a lot of it. Has been like Rob was describing, of just building in more and more of the nuts and bolts of, you know, evaluating and diagnosing what's going wrong in particular tickets and how we can improve them. Getting context back and forth between ticketing systems and our platform smoother and smoother so that we can scale efficiently. So I think that was the shape of our trajectory until the radical changes recently with Voice and now with our Coach product, which is the sort of coding agent era change to how we build.
[23:54]
A
Okay, there's a few things in there I want to follow up on. So you mentioned integrating with ticketing systems so you're not the platform that your customers are using to respond to tickets. Tell me a little bit about how does that work?
[24:10]
B
Rona's a resident expert. We are the platform that responds to the tickets, but we do kind of integrate with people's existing setup.
[24:17]
D
So our thinking there is obviously our meat of the product is like the actual AI conversational part.
[24:24]
C
Right.
[24:24]
D
And so like a lot of the ticketing systems and they've been around in the market for ages. Like, we're talking about Zendesk, we're talking about Intercom, that have put so much like, work into establishing that human operation facing UI and so things like queue management and whatnot and like being able to go and provide the human agent with triage material and tools and whatnot. We're not really focusing on that part per se. We're focusing on the actual meaty conversational AI quality part.
[24:51]
C
Right.
[24:52]
D
And so that plus human agent teams are already used to all these, like, operations. Right. So they've lived in Zendesk for ages. They lived in Intercom for ages. We're not trying to disrupt that. We're trying to bring low key as a little Persona, AI agent into the mix where we go and collaborate with the human agents rather than take over and say this is the platform they have to use from here on out, because that's just going to cause friction.
[25:14]
A
Okay. Since you brought up Intercom, I have to ask. They have finished. So does Intercom allow you to integrate into their platform? They don't see it as competitive.
[25:25]
B
Yeah, look, we are competitors on the AI level, but a lot of people use Intercom as a ticketing platform. And honestly, it's a great ticketing platform. I think if you're looking one, I would, for one, I would recommend it. We feel that our AI agent is much, much better. And so a pretty common pattern is for people to plug us in, but that's how it's playing out at the moment.
[25:45]
A
Interesting. So it sounds like I can use whatever ticketing platform I want and then your agent is on top of that. Okay. What I love about this is it reduces your product footprint and allows you to just focus on the hard bits. Yeah. Amazing. For people that are just listening. Charmaine gave me a giant thumbs up, which I appreciate the enthusiasm. Okay, so let's fast forward a little bit. Let's give me a high level overview of like how does the agent work today? I shouldn't even say agent because I know there's more than one. Give me the overview of the product. How does Lorakeet work today?
[26:25]
C
So starting off with how Rona said, Lorikeet is like just another agent on your team that sits on top of your existing ticketing system platform. What customer support teams do then is they have to teach Lorikeet how to talk to their customers. That make sense?
[26:45]
A
Yep.
[26:46]
C
And how that works is that you can give it a bunch of instructions, a bunch of tools to look at. Here's the admin page to go look at. It's like how you would train a new starter on your customer support team. And it has different capabilities. You can give it its personality. Some of our customers, they want super happy, use some emojis when you respond and others want be super serious and formal and concise. So yeah, so you teach Lorikeet how to respond to your customers and then you launch it live. Make sure build the confidence that it will do well and then launch it live. One of the key things with Lorikeet also is that by default Jamie mentioned AI humility. If Lorikey doesn't know how to answer, it's not quite sure, by default we'll hand it off to a human.
[27:38]
A
Is your product almost like just another user in somebody's intercom account or in somebody's Zendesk account? What comes to mind is this is a little bit like the Devin model. So instead of me using Claude code to have it write code for me, I'm spinning up Devin to go do. I'm talking about the dev encoding agent that the whole product is designed around. It's another engineer on your team. You just give it tasks and it does things. It sounds like this is analogous. Is that right?
[28:06]
C
That's true. And the key things there are with a new engineer on your team or a new customer support agent, the key things are how do you teach it how to do the thing, how do you review that it's doing good, especially at scale because it does stuff so much faster than real Humans that you use to review manually.
[28:24]
A
Yeah. Okay. And I can imagine, I can see how it's not that you're integrating with Intercom, it's that you're giving Lorikeet access to Intercom data because it's just another user on Intercom or another user on Zendesk. Yep.
[28:37]
C
Yes.
[28:38]
A
Okay.
[28:40]
B
I was just going to add that the sort of the extra dimension in addition to the Devon model is the way that we do config and instruction and feedback now is through effectively something like a coding agent. And all of your listeners who've used Claud code and Devin and the others on a daily basis would be familiar with that kind of cooperative experience of here's what I want in broad terms. And then the agent goes and figures it out and proposes solutions to you and makes tweaks on your behalf. And that's happening like inside our product with this thing that we call coach. And then our concierge agent, which is talking to the general public is a lot more kind of locked down and has cross cutting guardrails and all of that kind of stuff. Because if somebody is, if some random member of the public is coming in to claim that they need a refund for something, then you don't necessarily want that super helpful, cooperative problem solver Persona to be just immediately doing it for them.
[29:33]
A
Okay, I'm glad you brought this up because this was the second thing you said earlier I wanted to come back to, which is you mentioned this like coding agent model that you're moving into. So it sounds like you have the concierge agent that's participating in the ticketing system responding to tickets. And now you also have a coach agent that's working with your customers. So tell me a little bit about what is that coach agent doing?
[30:00]
C
I wonder if I can start with the backstory of how configuration used to be. Yeah, like how the ux. So again, this is the part where customer support teams need to teach Lorikaid how to talk to their customers. In the beginning we had classic web app. We still have it. We have a web app where you can configure basically, hey, when a customer asks about this kind of question or topic or intent, here's a workflow, here's the standard operating procedure you need to follow to answer that question. And like Jamie mentioned before, there would be a mix of. You can tell Lorkeet, hey, for these bits, yeah, use AI for that because I want it to be conversational. But for these other bits, please just do good old fashioned deterministic code. Because I don't need you to AI to check is customer premium or not. That's just true or false. So we had this interface. It looked like a workflow builder where some boxes are not AI and some boxes are AI. The AI boxes we had the classic, you probably still see it these days, giant text box. You stick a prompt in there and tell AI what to do. And then we had one of our customers come to us saying, hey, what do I stick in here? You're the prompt experts. You tell me what to put in this text box. So from that we evolved into kind of a hybrid of big text box and just classic UI of toggle this or give me a list of what you want Lorkey to ask the customer. So, yep, giant text box, then a hybrid of prompt, text box and ui. And what we're finding now, though, is that workflow model where you build the building blocks. Sometimes that's not as easy for people to configure. So the funny thing is we reverted back to again, one giant text box where you like the coding agent model where you tell it instructions of, hey, here's how you answer when customer asks for a refund. But now with the better models and some. Oh, so a mix of that plus guardrails, as in, aside from this giant blob of text of what to do, AI, here are the rules that you need to definitely follow and cross check yourself before you respond. So there's a mix of instructions, sop, and here are the Definitely don't fail on these things. Yeah, but the thing is, there's a lot of features and all of these things to configure and make sure it's working well. So what we're moving towards is, hey, what if this coach agent can just do that for you and guide you through that? And so now we have this mix of a conversational interface. You just chat to it, tell it what to do, but you still have the UI to help you review that. It did what you asked.
[32:56]
A
Yeah. Okay, so this is making me think of another episode we did with this company called Nepal and Nepal basically make. It's called Nepal because it's your AI people that work at your company. Right. And they. And they also started in customer support. They do some back office operation stuff as well. And they ran into this problem of they didn't really want to give their customers a workflow tool because their customers, I can't remember, I think I can't remember if they were in E commerce or restaurants, but they were not tech people. Right.
[33:25]
C
Yes.
[33:26]
A
And so they were like, we can't give them a workflow interface at all. We're just gonna. And their model was, you're gonna train your Neeple the same way you would hire a new person.
[33:37]
C
Yes.
[33:37]
A
And so, like, they were like, all these companies have standard operating procedure binders. Just take that content, and we'll figure out what to do with it. So it sounds like you kind of ended up in the same place.
[33:49]
C
So I think, though, what we're going towards in the. So there's that. The problem we're seeing, though, is that, yes, they have SOP binders. Some of our customers don't have it at all. It's in their heads. The other thing is, even the people with SOP binders, it's not always correct and up to date.
[34:06]
A
Yeah.
[34:07]
C
So actually, yes, one of the things we do is, yeah, give us your SOP and we'll figure it out using Coach. But what we're moving towards is actually flipping it, as in, let's start by getting people to define what is good and what is bad. And this goes towards evals, which I don't know if you want it. It's more. Let's define what the test cases are, as in, when a customer asks about a refund, what are the scenarios where we should be escalating? What are the scenarios where we shouldn't be? And then, hey, Coach, now can you build the SOP based on that?
[34:44]
A
Interesting. Okay, let's talk through this. Like, I'm. This example you just gave of a refund. How are. Like, is the customer deciding, I need a refund flow? And do you have a template where they're picking a refund flow? And then what happens next? Like, what you just said, Charmaine, is very intriguing to me. I want to understand it better.
[35:05]
C
There's actually different ways to get into a refund flow because we have this one fintech customer where their customer base is teenagers, and when they ask questions, it's just, where's my money? Yeah, what money? How do you know to get into the refund flow if they're just asking, where's my money? So there are different ways to get into it. One way is that we actually have Lorikeet disambiguate, as in clarify lorakeet. If you don't know what flow to go into, go and talk to the customer more. And based on the business context of each customer we have, they know what kinds of questions to ask to go down into refund flow. The other way we do it is also Just based on what the customer is saying. If they are more clear about it, then, yeah, we'll match you to that flow. So we use AI to do that as well.
[35:55]
A
Okay. But not for the end user. For your customer who has to set up what should happen on a refund request. Like you talked about, this idea of a lot of it's in their head. We're going to start with Evals. How are you presenting that to them? What does that interface look like? What do they experience? Like, I'm a new customer. I just started using Lorikeet. I want to set up my refund flow. What does that look like?
[36:21]
C
Yes. For a customer support team. Right now, the latest and greatest is this Coach interface. So it is a conversational chatbot thing that does live in our app. And you can start by literally saying, because a lot of our customers, the companies we work with, a lot of them already have their customer support set up. They know refund is the top volume question that we get. So they might want to pick that to start with. They might want to give Coach their SOP and do that. Or if they've already connected their ticketing system, we already see the tickets coming in and Coach can go, Ah, I noticed that based on your SOP and based on these tickets, this is what's happening. Can you tell me what is good or bad? And I can tweak these instructions for you.
[37:08]
A
Okay. So it's also doing this ongoing evaluation of how did we improve this sop?
[37:16]
B
Yes, exactly. Right. And we recently moved our main web UI interface to really focus on this Coach Persona. And so when you. The default view of the web app is now the text box that Charmaine was describing as a conversational thing. And one of our designers actually got rid of our left nav and replaced it with the dialogue panel instead, with a discrete little menu for people to navigate with instead tucked away in a different part of the screen. So, like, we're really bullish on the idea that these. This powerful of an agent is much, much better at managing config and prompts and all of that kind of jazz than expert humans are.
[37:57]
C
We still need expert humans to review what the bot did. Sorry, what Coach did. And that's where we're finding that, yes, Coach default this conversational interface. But so for one of the features that I've been working on recently, we still need that UI interface to make it super easy for the expert humans to review what it did.
[38:21]
A
Yeah. Okay. I want to dig into this as well. But first I want to ask you're making this big bet on a conversational box right on your homepage. Do you ever get feedback from customers? Like you mentioned the Charmaine earlier, they don't know what to put in that box. This is a real challenge with chat interfaces. So what do you do to help with that step? If I just come there, like, how do I know to start with my refund flow?
[38:47]
C
Oh, so we also have the classic FCC and ChatGPT, those little prompt bubbles to tell you what you can get AI to do. What we heard from our customers though is that they don't click on that because they know like ChatGPT, it's not that helpful because it doesn't have context. So what we're starting to do is actually this is why I talk about how, yes, conversational by default, but we still have the app. Some of our customers said that actually if they see it, they're on that page in the app that they're working on. Like the feature I'm working on right now with the team is simulations, for example, testing, evaluation. If they want coaches help to create some evals for them. Our customers do not want to click in the prompt in the big conversational interface, but they do look for a button in the simulations page to say, help me with this specific flow. Create scenarios for this. As in what I mean, what we're trying to do is find places within the familiar web app UI to make it that we know the context. Do you want AI specific?
[40:01]
A
This is a really interesting, I think UX challenge. I experienced this a lot in the products that I use and even in the products that I'm building is chat is really great for when you just need to communicate something and there's no button in the UI for really struggles when it's just a blank page. And I think we're starting to see this evolution of these hybrid interfaces. Where we see it even in ChatGPT and Claude, where there's a canvas and then there's the chat and you're building something together. I think this is where like modern day UX problems. This is like a really meaty space. Charmaine, have you, Are there things that have come up as you've dug into this? And Rona and Jamie, feel free to jump in as well. But I'm curious about like, how are you making these trade offs? How are you thinking about when to do what?
[40:55]
C
Yeah, this is a huge topic. I might actually. So aside from this hybrid thing of doing stuff within the app, there's also the hybrid thing of the classic Customer support AI Thing is either just question and answer or some of our competitors, I think what they do is the AI just helps you come up with the answer, but the human still has to respond. And then on the other side, it's like you just let AI handle all of it. But, Rona, the Hill thing, do you want to talk about. There's this middle ground, I think, that we've come up with that Rona was going to experiment with. Yeah.
[41:33]
D
So this is. We ended up. We called it resolution in the loop. So what that entails is essentially going back to what. Actually going back to what Charmaine was saying, that there's, like, that gap of exactly what AI can and cannot solve. And what we call that is like the hard ceiling and then the soft ceiling. And there's something. There's a nice sweet spot that we need to go and optimize, and we want to go and optimize between that and. So what that means is like, AI is just shy of getting to that answer, but just can't quite answer it just yet. And so we think about, okay, what can we do in the mix to get the human in the loop to be like, okay, I'm just going to unblock you here, Lorakeet, tell me what you need. So oftentimes it would be things like, okay, I just need to know what the answer to this question is. Right. So it's not really a part of your knowledge base. And so then the human comes in and then answers that question, and then we go and go and focus on feeding that back into the knowledge base again. So that's something that we wanted to go and explore a little bit more. And we did a little bit of UXR in that space as well. And it looked like things like even, like, regulations and whatnot in one of our fintech companies was like, a very big reason as to why a human would have to be in, like, in the loop and then be able to go and unblock that AI and then continue that conversation so that you don't have to leave the customer jumping around and waiting for responses that are a bit too late.
[42:54]
A
Yeah. Okay. So I want to make sure I understand this correctly. I can see two extremes. One is we're just going to let Lorikeet respond to email on its own. The other extreme is I don't know how to respond to this. I'm going to escalate it to a human. And what you're describing is a middle ground of, I'm trying to respond to this. I'm stuck. But I could respond to this if I just knew this thing. So I'm going to ask a human the thing, the human's going to respond and then Lorakee is going to continue to handle the ticket. It.
[43:24]
C
Yes.
[43:25]
A
Yes. Okay. This makes me think of another episode we did with this company Incident IO So they do site relay. They are building an AI site reliability engineer. So imagine you wake up to a page at 2 in the morning because your site is down and you've got all your engineers on Slack all in their pajamas on their computers trying to figure out what's wrong. Their AI sre is also in Slack trying to figure out what's wrong. And they may be the most sophisticated team I've interviewed because their agent spins up sub agents, it generates hypotheses, it gets other agents to go investigate each hypotheses. And then what's really cool, Charmaine, about your comment about AI human interaction. Their agent is communicating with the team in Slack. I have a question. I can't test this hypothesis on my own. Does anybody know this? Or it'll be like an agent found a finding and it will throw it into Slack and another engineer can be like, I already investigated that. You can rule out that hypothesis.
[44:30]
C
Yes. So the funny thing is we also have this agent, we call it Laura Zowski and we talk to it in Slack and when an alert comes up, it will investigate it for us. But yes, it's a similar concept to what Rona was saying. Imagine that kind of thing. But for customer support.
[44:47]
A
Yeah. This is one of the areas that like okay, it's cool to geek out on generative AI. Like it may be a life changing or humanity changing technology. I get it. The things that like it's Thursday that make me really excited is exactly this space of we're still learning what this technology can do for us and we're still learning how do we collaborate with it.
[45:11]
C
Yes.
[45:11]
A
And so this idea of we don't have to live at the extremes, you do it all at the automation or I do it all. There's this like really fun collaboration space.
[45:22]
B
I really enjoy the analogies that we get to see from customer interaction and user interaction. And so the kind of the knowledge flows that Rhone is describing of our concierge customer facing agent has a little knowledge gap and it pauses and gets some input and then continues or spins up a sub agent to go call someone to get some information it needs or something like that. That's mirrored in our app, I think, with people's interactions with Coach because often it can just do stuff for you, but most of the time it needs to check something or hey, is this what you meant or am I right in seeing a knowledge gap here in your knowledge base material or that kind of thing? And so those loops show up on both sides.
[46:02]
C
That messy middle bit. It's not this extreme weather, extreme. There's a middle ground of where human and AI can collaborate. What we're finding is because we see it in concierge and in coach and also in our day to day when we're dealing with AI. I think what we're finding is that ideal balance depends on the task and depends on the expert human that's working on it. And what we've been focusing on lately is what are the critical things that you need for the AI to spit out for the human to review it efficiently. Yeah, if that makes sense. It does, yeah. The thing is, the company I worked in before, where before Gen AI, that machine learning app, Actually we rebuilt the app, we left the machine learning models the same. All we rebuilt was the ux, just the ux. And yet our users were four times more efficient with their tasks. And all we changed is how the humans can review and correct what the AI got wrong. So I think that bit of defining what good is and making an interface for users to easily define what good is and easily review the most important things they care about, I think that's the fun bit.
[47:23]
B
Do you mind if we just return briefly to the topic of the prompt? Sorry. The message suggestion boxes that Charmaine was describing earlier as a UX mode. I just want to deliver a short rant on this, which is I hate those things and I feel vindicated that they didn't work very well in our product for our coach thing. And I think it's related to something that we see over and over again in customer support, which is users don't care about your org chart. And often those things do reflect internal concerns. And so when you pop open, like the classic thing is you pop open a customer support chatbot and instead of being able to type, it's got these suggestions and they're always something like, do you want to report a bug, ask a question or file a complaint? And those clearly map onto internal teams, which is where who's going to respond to the ticket? But I'm there thinking, I don't know, I've got a question about this thing that happened and I guess I'm upset about it. Is that a complaint? I maybe it's a bug. I'm not sure, yeah. Like, yeah, humans shouldn't have to care about your org chart. And I think they can just often jump in and start talking. So we need to find different ways to prompt or to guide people into the context that the AI needs.
[48:33]
A
I think I actually 100% agree with this rant because the first time I saw those little, like, pills on ChatGPT, I was like, I don't get it. First of all, I use ChatGPT for 4,000 things, and you're going to communicate that value to me in four suggestions. One, that's the first problem I had with it. The second problem I had with it was like, you're a conversation interface. Why not just start with a conversation? It's such a weird. It's not taking advantage of the strength of the tool. This is where I think we're still learning how to let go of the patterns and norms from the old world and reinvent the patterns and norms for the new world.
[49:16]
B
Exactly.
[49:17]
A
Okay, so you have a concierge agent. You have a coach agent. I think there's a third agent that maybe was referenced in your application. Is there one around evals and making continuous improvements, or is that what the coach is doing?
[49:36]
C
We put that into coach. So coach for us is anything that helps you make your concierge better.
[49:42]
A
Okay, yeah.
[49:43]
B
It's an evolving concept, but we're basically leaning into coach maximalism. Everything that is improving the product is part of the coach Persona.
[49:50]
A
Okay. And then let's talk a little bit about. Let me just ask the question this way. We've talked to a lot of people that are building agents. We've talked with a lot of people about. I'm saying we because I'm including all my listeners with agents that do customer support. But I think in every conversation, there's been really interesting, almost like simple things that were surprisingly hard. So I'm curious about. As you've built out these two agents, what surprised you about on either end? Like, it was surprisingly good at something, or you thought it would be good at it, and it was just. It became a tricky scenario.
[50:34]
C
So what I've learned is that a lot of our customers, like I said, some might have SOP binders, some, it's all in their heads. They expect sometimes AI to come in and save them when they don't have the processes quite mapped out yet for themselves. And that actually is what makes it really difficult. And so having coach guide them, I think that really helps. But what I also learned is that sometimes companies can't fix the process. So for example, a company that relies on some third party company for their processes to be sorted out, for them to answer their customer support tickets or this health tech company in the US and they have to deal with US insurance. And however that works. I don't know how that works. I'm Australian, but I heard it's messy. So yes, we need to help our customers clean up their processes so that AI can automate it. But some of those processes they have no control over. And how do you deal with that then? Yeah, so
[51:40]
A
yeah. And okay, there's. I think there's two things, Charmaine, that I just heard in your response. There's this first, we can only automate or even augment, forget fully automate, but like we can only augment what we can clearly define. And what's nice about LLMs is clearly defined. Just got looser. But it's not totally loose, right?
[52:03]
C
That's right.
[52:03]
A
It's looser than deterministic code. But it's still. We gotta be pretty good about telling an LLM what we want it to do. But humans have this enormous capacity to live in this very ambiguous gray fly by the seat of our pants. Like I'm sure there's plenty of companies out there that like barely have a process and they still respond to customers and maybe there's some heroic employees behind the scenes. This seems like one of the hardest challenges with customer support of like just how do you even help your customers maybe mature their processes so that they can better leverage your tool are the things you're exploring there. I know the coach might be a way to help with that.
[52:50]
C
It was a mix of coach and even before coach this iterative process of so for example, Lorikeet doesn't require that your knowledge base articles are super pristine before you release Lorikeet because again we default to if Lorikeet's not sure it'll hand off to a human. So it's safe in that way. So you can firstly we make it so you can iteratively and incrementally get Lorikeet out there. And as it's out there we have these ways to evaluate how lower key is performing at scale. So we can say, ooh, 50% of the things being handed off to humans is about this kind of question. Maybe you need to improve your knowledge based articles on this.
[53:40]
A
Yeah, it seems like the crux of this though is you've had to figure out how to train Lorikeet to know when it can't respond to something like such a simple thing. But I know it's probably not.
[53:51]
C
Yes, right. That's right.
[53:53]
A
LLMs are very confident and they're very confidently wrong. So tell me a little bit about what you do there.
[54:00]
C
I just want to. What I found really funny is because you know how AI is already super helpful.
[54:07]
A
Yeah.
[54:07]
C
In our prompts because we say, hey, you're an AI customer support agent. It's even extra helpful because it thinks it's customer support. It needs to help you. So I'll take, I'll let Jamie talk more about that.
[54:18]
B
Yeah. Oh, well, you're right. I mean, these models are fine tuned. They're very strongly biased towards being helpful, enthusiastic, a little bit overly earnest, and they have a rough idea of what good customer support looks like. And so they're just absolutely bursting with this enthusiasm to promise things or offer things to you because it's the kind of stuff that good customer support agents might do in that situation, regardless of what's actually happening.
[54:49]
C
Yeah. And what we're doing around that is a mix of the usual LLM as a judge. It's in our prompting. We literally explicitly ask it to evaluate its own confidence levels. And on top of that we have guardrails as well.
[55:07]
A
Yeah. Say more. If you can say more about the guardrails you have in place.
[55:12]
C
The guardrails are more like. It's also still LLM as a judge, as in before. So Lorikey, based on its training, will come up with a response before it goes out to the end customer. A bunch of other little isolated LLM checks run against it to check is that good? If not, what should I do? Should I escalate or should I steer it, as in tell it. Yes. If you've seen that pattern encoding agents, you notice that it's thinking it's off. You can type in a message to steer it in the right direction. So guardrails does that. And the key thing about guardrails is there's some guardrails that are system wide, as in, we think it makes sense by default for all customer support. But a lot of it is configurable because our different customers have their own specific what is super risky for them. So they have to configure what guardrails to check.
[56:07]
A
Oh, interesting. Okay, so this is the second time you've mentioned customers being involved in defining evals and now guardrails.
[56:15]
C
Yes.
[56:15]
A
Let's talk a little bit about this because I think this is very unique and it resonates with me. When I first learned about evals, I my first concern was it's another realm of products where we're relying on a domain expert to make decisions that maybe our customers should be deciding. And so I really wanted to push on. How do we get evals closer and closer to the customer being the judge? And it sounds like you're really moving in that direction. Tell me a little bit more about that.
[56:44]
C
Oh, I guess in the beginning our guardrails were very hard coded and things like if customer is sounding hostile, do blah, blah, blah. But what we found, okay, there was this one health tech company, their whole business is weed. And one of the customers, their end customers asked, lorked, hey, where's my weed? It's a legitimate question for this business. But that guardrail fired and said, please stop being hostile. So we learned early on that guardrails are quite dependent to the domain that business is in.
[57:23]
A
Yeah, so
[57:26]
B
one of those like weird, uncanny kind of language model associations where I guess in the pre training like words about medical cannabis and weed and dope were just in the same cluster as being impolite or rude or hostile or something. Something that would never occur to a human in that context. But it seems natural to the LLM. But so yeah, we've made them very customizable. And the other thing I like about guardrails is that it lets our users prove a negative, which is otherwise very difficult because a pretty common pattern is for a compliance team to say, can you guarantee to me that this thing will never offer a loan to someone in a financially vulnerable group or something else that the regulators really care about. And of course with normal language models and evals and stuff, you can get pretty close and you can try your best and you can convince yourself that it is very unlikely. But having this cross cutting thing means that you can guarantee that if that happens. We practically guarantee that if that happens, it'll be detected and then immediately call it by a separate system.
[58:29]
A
Okay, so let's talk through this. Let's say I work at this weed company and I don't like the fact that the hostile eval or guardrail is catching the responses because they're totally valid customer input or valid responses. What is that? Like how are you getting your customer to define what guardrails they care about?
[58:50]
C
I'll start with suite company. The thing is, it's because back in the day the hostility guardrail was hard coded and very isolated. What we do now is that each say a new customer support team starts using Lorikeet. You define, you tell Lorikeet what your business is even about. So you give it context about your business. And now that hostility check can see that context. And so the hostility check won't fire because it knows you're in the weed business. So that's one way of doing it. But guardrails in general, I think, like Jamie mentioned, it depends on their regulations to decide what kind of guardrails you need, if that's what you need.
[59:30]
A
Yeah. So are they just maybe this is happening through the coach, or do they just communicate to you these are the things that can never happen? And then you create the guardrails for them. What does that back and forth look like?
[59:44]
C
Oh, either way, you can use coach. It's still in the ui. Literally Create guardrail. What do when this guardrail either steer or escalate. So you could do it in both UI and in coach.
[59:55]
A
Yeah. Okay, so let's just talk through this. Create this guardrail. I'm doing it in the ui. I can imagine most humans don't describe a guardrail clear enough to be something you can then turn into code or even a real LLM prompt. Like, when they click the Create guardrail button, are they just getting a box to type in never do this? And then you figure out what to do with that?
[60:18]
C
So the funny thing is, yes, that is one way to do it. And again, that's what we're trying to invert as in what if? Instead, you define the test case of hey, when a customer does this, what I want it to do is escalate, for example.
[60:33]
A
Okay.
[60:34]
C
And then you have that test case, and then you could either try to put it in the instructions and see if that will pass. You will probably run it 100 times and you'll see maybe it still happens 2 of the times. So then what Coach would do is it would actually recommend, hey, for these kinds of cases, if you absolutely want this to not happen, I recommend here's a guardrail. I'll draft it for you and then it will run the simulations again, the test case, until it gets better.
[61:04]
B
Yeah. And I reckon that's so much that's such a more effective way to create this complicated config and prompts and stuff. And it's also an interesting example of where the chatbot coach Persona gets surfaced to people because, like, to your earlier questions about, it's quite hard to be faced with just an empty chat chatbot. What block? A pretty common flow we actually see is that our users will be reading through a ticket and they'll say, hey, it shouldn't have said, I could offer you weed on credit or something like that. Okay, cool. What do you mean by that? Like, tweak the instructions for the agent and then they might say, no. Like it's. The regulators have been really insistent that we cannot offer credit for this weed or whatever the particular scenario is. Oh, okay, no problem. Let's create a guardrail for you. And that concept, which is not necessarily front of mind for our users, it can just go handle for them.
[61:53]
A
You know what I love is that we're basically putting our traces and our error analysis in front of our customers. And then when they identify the problem, you're using an agent to be like, okay, we'll just solve this for you.
[62:04]
C
Because sometimes you don't know why AI did a thing. So like Rona said, we built in some foundational things to basically build out the trace and map out which ones are the important bits. Because sometimes if the customer support person doesn't know why AI said the thing, it'll just say. The customer support person can just say, it shouldn't have said this. It should have done blah, blah, blah. Instead we actually just get AI to run coach, to run through it and diagnose it for us and also suggest the fix.
[62:34]
A
Okay.
[62:34]
C
And it will just pull the human in the loop when it's not sure what actually do you want it to do? Is this test case correct? And then it can diagnose, try to fix it, iterate through the fix until the test case passes.
[62:49]
A
I'm hearing an idea that I'm definitely going to steal, so I want to highlight it. Yeah, I am definitely going to set up an agent that looks at my traces and diagnoses where the failure mode came from.
[63:02]
C
Yes.
[63:02]
A
That is just pure genius. Right? It's already enough work to even identify there's a failure mode. Now go investigate it. That I will definitely steal because it just sounds amazing. Yeah, I love this. I selfishly started this podcast so that I could get better at building AI products, and you just gave me a little nugget to take away. Okay, we are coming up on time. Is there anything I haven't asked you that you wish I did?
[63:29]
C
Because your background's in product discovery and all this. I was actually thinking about the idea that Rona was describing earlier and just how engineers at Lorikey come up with ideas. I think that might be interesting. Only if Rona wants to share it.
[63:46]
D
I think this is. It is good to talk about because at Laura Key, it's like we're not just heads down builders kind of thing. Right. It's more like we're beyond that horizon. Like we Wear all those hats. And we're very much encouraged to wear all these hats. Right. Like, for example, Jamie, at the start of week meeting every week, asks everyone in the engineering team, what is one thing that you learned from a subscriber? So, like, the last week. And so we're really encouraged to stay in tune with all of our subs and also perform those UXR discoveries as well. Right. So validating those hypotheses, being able to push it through. Like, at Loraqu, we do this staged process of launch.
[64:23]
A
Right.
[64:23]
D
So from alpha to beta to launch, and then at every step, being very intentional about it, because the last thing that we want to do is create all these features, but then we can't, one, maintain it, or two, it doesn't even make sense to even push it out. Right. That kind of thing. And so it's like at Loriki, like, we're all very much encouraged to be, like, product engineers across the board.
[64:46]
C
Especially in this age of AI makes things so much faster to build. It's even more dangerous to have so much AI slop in the code base. That's useless.
[64:57]
A
Yeah, this. Oh, this is something I think a lot about. I think I shared. I started as a designer slash web dev, and then moved into product and then back into engineering. I just feel like the trio model, like, if you work in an organization where all those roles exist, you absolutely should be collaborating. But I also think this idea of the trio collapsing is one of the best things that could possibly happen. And, like, to me, this is the way that, like, the best teams that I've worked on have always worked. Nobody cares what the job titles are. We just jump in and do the work. And one of my favorite trends that's happening right now is we're seeing teams get smaller. We're seeing teams play multiple roles. We're seeing engineers get excited about uxr, we're seeing designers write code. Like, I. I love all of this. But, Charmaine, you also touched on what I think is the, like, dark side of that, which is I think a lot of companies are going to make mistakes and release way too much, and we're going to lose that filter around product coherence and product quality. And so I think we're going to be wrestling with this. Like, we now can build any idea that we've ever had, but should we?
[66:10]
C
Yep.
[66:10]
A
And I think this is becoming the most important question.
[66:14]
C
Yep.
[66:15]
B
Yeah, 100%.
[66:16]
A
Yeah. I love that. Across, like, how big how many employees
[66:20]
B
are At Lorikeet, we just crossed 70. 70 it's still pretty small.
[66:25]
A
And do you have product managers, designers and engineers, or is everybody a product engineer? What does that look like?
[66:33]
B
We have a handful of product managers and some very talented, very overworked designers. Probably too few of them, but yeah, mostly engineers we hire. Very product managed. Sorry. Very productive, product focused, user focused engineers.
[66:46]
A
Yeah, that is very clear to me from this conversation. And I'll share. I think people made a big deal about Stripe didn't hire an engineer, a product manager until they were like over 200 people. Stripe will say that was a mistake. Like they'll look back and say we should have hired one sooner. But the reason why they were able to make that mistake for so long is they prioritized hiring product minded engineers from day one. And some people take this to mean that like product managers aren't needed. Again, I think we're making the mistake of hyper focusing on a role. Like, I think the takeaway is everybody in your organization should be product minded. Right?
[67:26]
B
Yeah. My co founder was a product manager at Stripe and so that kind of thinking has really influenced a lot of how we think about things. And one saying from Stripe that we often think about and we really like is that it is very hard to turn billions of dollars into quality software. That's the essence of our moat. I think that it's just like this is a really hard kind of job to pick the right thing to build and the right user problem to solve. And it's what gets us really excited every day.
[67:56]
A
Yeah, amazing. Okay. This has been absolutely delightful. I really appreciate you taking the time. If you enjoyed this conversation, please subscribe in your favorite podcast app and give us a rating as it helps others find the show. Thanks, I appreciate it.