Summary7 min read

Just Now Possible with Teresa Torres
Episode: Building a Career Co-pilot for Disadvantaged Students: How Zero Gravity Bridges Knowing and Doing
Date: January 8, 2026

Episode Overview

In this episode, Teresa Torres dives deep with Dan (Software Engineer) and Elliot (Product Manager) from Zero Gravity—a UK-based organization helping disadvantaged students access elite educational and career opportunities. The focus: Zero Gravity’s journey from a mentoring platform to building an AI-powered “Career Copilot.” They explore how AI bridges the “knowing-doing” gap for students, discuss product design challenges, and give a transparent look at agent architectures, context management, evaluation strategies, and safeguarding protocols in AI-driven mentoring.

Zero Gravity: Mission & Context (00:54–07:15)

Mission: Zero Gravity empowers disadvantaged UK students by breaking down barriers to accessing elite education and career opportunities.
- Notable Quote:
  “Zero Gravity essentially helps disadvantaged UK students access elite opportunities and careers. … We’re trying to break down those barriers to access and to network.” —Elliot (00:59)
Founding Story: Rooted in founder Joe Seddon’s own experience overcoming socio-economic obstacles.
Business Model: B2B2C platform:
- Partnering with state schools/universities for student referrals
- Corporate partners buy into the pipeline to access untapped talent.

Understanding the “Knowing-Doing Gap” (02:26–06:50)

Problem Space:
- Students from less-connected backgrounds lack awareness of career paths and the steps needed to access opportunities.
- Many don’t know what “good” looks like: insider info, networks, and strategies are often absent in their communities.
- Parallels in the US/UK: Lack of exposure to “hidden curriculum,” e.g., internships, SAT prep, networking (05:44)
- Imposter syndrome and doubting self-worth are common hurdles—even among high-achievers.

Zero Gravity’s Core Service (07:15–09:27)

Mentoring at the Core:
- Proprietary algorithm matches eligible students to mentors from similar ambition areas, e.g., “math at Oxford.”
- Emphasis on the power of relatable human connection:
  “I can see myself in this person.” (07:15)
- Natural pathway for successful mentees to become mentors.
Safeguarding: A critical pillar due to working with minors (age 16+).

Introducing the AI Career Copilot—Vision & Early Prototyping (09:27–14:15)

What Is It?
- “An AI career copilot… acts as an orchestrator which bridges the gap between knowing and doing.” —Elliot (02:26)
- Analyzes student context (career ambitions, activity on platform) and suggests/highlights impactful next steps.
Key Choices & First Steps:
- Focused on orchestrating existing resources, not automating mentorship itself.
- Notable Quote:
  “We’re trying to build an orchestrator, not an automation tool.” —Dan (09:27)
- First prototypes explored augmenting, not replacing, human mentors.

Discovery, Scope, and Prototyping Journey (12:28–19:47)

Analysis Paralysis & Technical Temptation:
- Initial inclination toward complex architectures (vector databases, RAG, LLMs via Azure).
- Pulled back to focus on the users’ immediate needs, not just on “what’s cool.”
- Importance of regular user feedback, weekly mentoring from the whole team.
Prototyping Mindset:
- “It’s so easy to get carried away by the hype.” (14:55)
- Advice: “Go down the complex route, then work backward.” (17:21)

Product Architecture & Evolution (21:08–28:48)

Pragmatic Architecture:
- Chose not to start with RAG/embeddings; focused on leveraging structured data and search for matchmaking.
- Early “agentic” design centered on tool calls for data and action orchestration.
First Student Interface:
- Entry point: Button on job cards presenting a modal chatbot.
- LLM-generated summary compares student’s profile to job requirements, suggests next steps.
- Initially just one-off summary, not a back-and-forth chat.
User Feedback & Iteration:
- Early user response: “Oh, that’s nice. Cool… but not a co-pilot moment.” (28:48)
- Students needed more explicit, interactive feedback—generic summaries felt impersonal.

From Prototype to Fully Agentic “Copilot” (30:37–38:11)

Learning Through Testing:
- Students needed to feel the “LLM magic,” perceiving the recommendations as truly personal and dynamic.
- Decided to “open up” the chat interface to allow iterative, conversational guidance.
- Reluctance to simply drop in a generic chatbot—emphasis on embedding context (“doorways”) and guidance, not leaving students with empty text boxes.
Guided Prompts & UX:
- Chat now starts with a summary/prompt, then allows the student to dig deeper.
- Copilot asks questions too—encouraging self-reflection and follow-up action.

Technical Deep Dive: Context Management, Safeguarding & Evaluation (40:40–58:06)

Context Management (40:40–51:58)

Multiple Tools:
- Data fetching (student profiles, mentoring, opportunities, engagement data).
Token Management & Summarization:
- Systematically compacts chat history; removes redundant tool calls and irrelevant data.
- Only recent user behavior (~3 weeks) is summarized; avoids sending full platform history.
- Exposes tool availability per turn, controlling which actions are open.
- Notable Quote:
  “We decided quite early on that we wouldn’t go down [the embeddings] route.” —Dan (27:40)
  “We have a hash representation; at the beginning we just allowed the tool call to fetch that entire document… now we just present the changes.” —Dan (45:38)
Agents & Orchestration:
- Single orchestrator agent (loop) manages which tools/context are relevant per turn.
- ChatGPT models chosen per response type (higher-level models for summaries, lower latency ones for back-and-forth).

Safeguarding & Moderation (51:58–57:39)

High Priority: Built into the product from the start.
Layered Approach:
- All user and AI-generated messages pass through OpenAI’s moderation and an external partner (Unitary) for advanced checks.
- Moderation step adds insignificant latency.
Observability:
- All interactions are reviewable; staff are trained in safeguarding and able to tag/monitor conversations for quality and risks.

Evaluating Output Quality (58:06–62:49)

Current State:
- “Spotting trends” via observability tools (e.g., Metabase).
- Internal “red teaming” and labeling to identify issues: hallucinations, tone, stale/vague recommendations, forced encouragement, etc.
- No large code-based/automated evals yet, but aspiring to scale with data sets, code checks, and LLM-as-judge methods.
- Notable Quote:
  “People just see a bad output, whereas…we’ve got to really pay attention to and tag.” —Elliot (58:06)

Current State & Impact (38:11–40:30)

Copilot’s Role Today:
- “Tightening orchestration” of all platform features—students are guided to upskill, update their profile, join the community, or reach out to mentors, not just hastily apply to jobs.
- Copilot remains embedded at critical touchpoints, guiding action without dominating the user experience.
- Observed improvement: More thoughtful student applications and more engagement with community and mentors.

Challenges and Next Frontiers (64:23–end)

Long-term Memory:
- How should the AI remember (or forget) across a student’s years-long journey?
  - GDPR compliance, privacy, and user agency are key.
  - Inspired by cursor’s approach: user-controlled memory.
Socratic Reasoning & Action Execution:
- Future plans to make Copilot more “coach-like” and to orchestrate actions (possibly via MCP) with caution and transparency.
Ongoing Product Philosophy:
- Avoiding over-automation, keeping a human-in-the-loop, and always centering safeguarding and practical metrics.

Notable Quotes & Moments

On Bridging the Gap:
“We’re trying to build an orchestrator, not an automation tool.” —Dan (09:27)
On Early Prototyping Temptations:
“We had ideas of completely automating mentoring… But we had to bring that back down to earth and understand what the users actually need as opposed to what’s cool.” —Dan (11:35)
On the Product Journey:
“It’s so easy to get carried away by the hype.” —Elliot (14:55)
On Context Management:
“If a job hasn’t been updated… there’s no need for the LLM to go and grab that tool.” —Dan (22:18)
On Product Impact:
“People are doing the work they need to do before applying a lot better than before.” —Elliot (38:11)
On Memory & Privacy:
“How do we make sure we can allow the LLM to retain that relevant information throughout [a student’s] process that the member can then have control over?” —Elliot (64:23)
“I personally much prefer Anthropic’s approach where I control what’s in memory 100%.” —Teresa (67:30)
On Safeguarding:
“We are originally a mentoring platform and I think critically, I think AI can never make those safeguarding decisions. But it can help us surface certain things.” —Elliot (56:17)
On Evaluation:
“You just need to know what the things you’re looking out for and then be able to tag in that sense.” —Elliot (61:22)

Timestamps for Key Segments

00:54–02:21 — Overview: Zero Gravity’s mission & founder’s story
04:10–06:50 — The Knowing-Doing Gap: Socio-economic barriers
09:27–12:10 — AI Career Copilot: Vision & design choices
17:10–19:47 — Prototyping Mindset: Starting simple, not over-engineering
21:08–24:45 — Technical Architecture: Tools, context management, and agents
28:48–38:11 — User research, guidance vs. open-ended chat, evolving the copilot
40:40–51:11 — Deep technical context: Context window, tool calls, prompt engineering
51:58–57:39 — Safeguarding, moderation workflow
58:06–62:49 — Evaluating AI output, red-teaming, failure taxonomy
64:23–68:42 — What’s next: Memory, coaching, Socratic reasoning, action execution

Tone & Takeaways

Conversation is candid, reflective, and practical—focused on learning, transparency, and responsible innovation.
Emphasis throughout on NOT overengineering, always starting from student need, and iteratively improving both tech and human support.
Zero Gravity’s journey offers a roadmap for anyone aiming to build ethical, impactful AI in sensitive domains.

For Builders: Key Lessons

Resist over-automation—focus on orchestrating/human amplifying tools.
Context management, safeguarding, and evals are foundational, not afterthoughts.
Observability (of both quality and safety) needs to be baked in from day one.
Open UX choices to student control; explain what AI “remembers” and gives agency wherever possible.
Iterative, user-guided development trumps “shiny object” syndrome.

Listen for

Candid stories about navigating the hype cycle (19:47)
Insights on practical prompt engineering and token management (41:44)
Real-world lessons from red-teaming and failure taxonomies in LLM evals (58:06)
Adapting safeguarding mindsets from mentoring into AI environments (56:17)
Thoughtful debate on what “memory” in AI should look like for students (64:23)

Ideal for anyone building or evaluating AI-powered products in education, coaching, or sensitive B2C settings.

Loading summary

Transcript123 lines

[00:04]
A
Welcome to Just Now Possible with Teresa Torres.
[00:10]
B
Hi, my name's Dan, I'm a software engineer at Zero Gravity. I've been in software for about just over 10 years now. I started in the sports tech space and transitioned over to the educational space. I've been at Zero Gravity for almost four years now and most recently have been building out our new career Copilot features for the past three to four months.
[00:33]
C
And hi, I'm Elliot, I'm a product manager here at Zero Gravity. I've also been here about four years. I think I joined a few weeks after Dan actually. My background is prominently in operations and cx, a variety of different UK based startups and scale ups and yeah, most recently I've just been taking the reins on this AI focus from Zero Gravity side of things.
[00:54]
A
So tell me a little bit about what does Zero Gravity do and then we'll get into your AI product.
[01:00]
C
So Zero Gravity essentially helps disadvantaged UK students access elites, opportunities and careers. And it came from the thing, from our kind of like a dream from what our founder Joe Seddon essentially because he lived that experience of growing up in West Yorkshire from single parent household and he felt firsthand how hard it was for students like him to break through those barriers. So when I say breakthrough barriers, that's what it all boils down to. We're trying to break down those barriers to access and to network with partnering with lots of different state schools and universities and creating what I think now is a, I think we call it a B2B to C products which essentially allows schools to refer students and then corporate partners will buy into our kind of our pipeline in order to tap into that talent pool that we're creating. So about in this nutshell is what we're doing.
[01:51]
A
I love this space. I will share that I am a product of a single parent home. My family skipped a generation with college, so my grandparents went to college but nobody in my mom's generation went to college. And then I was the first of the next generation to go to college and I got very lucky in that I got accepted to Stanford and it was probably the biggest impact on just my trajectory and what became available to me. So this is definitely a problem space that resonates with me and I'm excited to get into it.
[02:22]
C
Nice. Likewise.
[02:24]
A
Okay, tell me a little bit about your AI product.
[02:27]
C
What we're doing is we are, we're building what now we're calling an AI career copilot. The name is something that I was apprehensive to use originally but copilot in the actual what it is doing for our students is very apt, I think. So ultimately it acts as an orchestrator which we are saying bridges the gap between knowing and doing on the platform. So what in that, what I mean is it analyzes the student context, understands what they do on the platform and what their ambitious are where they currently study, for example, and it helps them ultimately execute the most impactful next step within that platform space. So what we're doing at zero at the moment is that we've got the kind of career tools already in action, so we've got a community space. We originally started off as a mentoring platform, one to one mentoring all proprietary tech in terms of what we're doing and that's grown out into what is a community live masterclasses on demand learning pathways as well and an opportunity space so they can be matched with different opportunities that our partners have such as grad schemes and things like that. And whilst we have all those tools, what we essentially found out is that students don't quite know what good looks like. And that is something that might sound quite obvious, but it's something we had to understand because a lot of students can be at different scale, they can be very switched on and attuned into what they want to do. But that doesn't necessarily always mean that they know what the best next step of that is. So let's say they want to tweak their cv. If they want to upskill in certain areas or certain skills for them, it's always they're being told what that could potentially be by a teacher, by a job spec or whatever. And they're the lucky ones, right, because the others don't have a clue. And that's what we need to help them.
[04:10]
A
This is something that I'm not sure all my listeners are going to be aware of. And so I want to just expose a little bit of this and tell me I'm based in the us. There may be some differences with the uk, so let's explore that a little bit. But I know for me like I didn't know what I didn't know. So like I didn't know until I got to Stanford and was exposed to other people from different like socioeconomic classes. Like I didn't know about the types of jobs their parents had. I didn't even know those things existed. I didn't know, oh, people take classes to get good at the sat. I know that's a US based thing, but I think it will translate like I didn't know that was available and that people did that. I didn't know that. Like, your summer internships are really critical to set you up for your job after college. Like, I just, there was nobody in my family that had this knowledge to share this with me. And I remember even one moment in college, somebody was reading, like, the stock market section of the Wall Street Journal. And like, I didn't even know what the stock market was. And I remember being like, wow, I'm just at a very different starting point. Like, I didn't get exposed to, like, conversation about the global news in my home. There was just like, I had huge gaps in, like, just my knowledge about the world and how the world worked and how college worked and how jobs worked. And it sounds like that's a lot of, like, your comment about the knowing doing gap. Like, first of all, you have to know what to do. And then second of all, like, you might know I'm supposed to go take an SAT class, but do I have the means to do it? Do I know how to find it? Do I, do I know where to go to do that? Is that kind of, is this kind of along the lines of what you guys are helping with?
[05:45]
C
It's very similar. Like, everything's resonating and we're seeing a lot of similarity in the US and that kind of that disparity. We focus a lot on that network disadvantage for our students. And there are like, stats around, you're more likely your X likely more X times more likely to know somebody that went to Oxbridge or is working as a lawyer or xyz, if you went to private school compared to state school. And that is something that in the UK has been a consistent kind of topic. And you do you always hear about it growing up. And when you experience it as well, when you hear stories like Joe's, for example, and you get that like, severe imposter syndrome, when you do finally make it, it's like it's doubling down, isn't it? I put all this hard work in and looking around me, all these people are have known somebody. Maybe they knew they had a family member who was able to recommend them. Maybe they had the finances to fund them to, as you say, upskill quicker. And it's something we are currently working on, especially in the tech space as well, because there's a whole other conversation that we could be having around the tech and AI divide as well, which we can get to later on.
[06:51]
A
So exactly why I got into educating on AI, because I see that divide already and it's annoying me. Okay, so let's get into this. Tell me before we get into your copilot, specifically, give me a sense of what does Zero Gravity do to help folks with these backgrounds. What's sort of the overall service that the company is providing? And then we can use that as a foundation to get into. How is it AI helping?
[07:16]
C
What we're doing is our very core focus was mentoring. And there's the power of mentoring. So students would log in, they'd either be referred by the school or they'd log in or sign up organically in order to be able to access Zero Gravity, we basically have an algorithm which will determine the lowest areas of opportunity within the UK and it will grant you eligibility to the platform, essentially. So by doing that, you're able, you're then in that kind of catchment group of, oh, look, you've got the highest opportunity here, you've got the highest potential, and we're here to help you with that. And with the mentoring process, it was matching them with students who were in their kind of ambition areas. So let's say that you wanted to do mathematics at Oxford and you're a state school student and you sign up Zero Gravity and you're already low on confidence, you don't know anybody there, you probably the first in your family to, to apply to this university and you, you enter, you're matched for an algorithm with a mentor and we're doing the same thing as you and you have these sessions and it's just something that we've built on from there, really, that sense of mentorship and upskilling, of having somebody that's been in your position and kind of the ranks to broken down the barriers to get to university and then having that conversation with them. And I think I want to make a good point about this, is that human element has been really interesting for us because it's, oh, look, you've got. This is. I can see myself in this person. And with that, we've seen some really good success of actually throughout mentoring people, then getting into university and then becoming a university mentor themselves as well.
[08:45]
A
Yeah, this is great. So you're starting with high school kids. And what I love about this is that, like, I remember being a high school kid, I didn't know how to pick a college and my parents didn't know how to help me pick a college. And I remember, like, applying for financial aid and I just did it on my own. And it's funny, as an adult, like, I see my friends, like parents are really involved with their kids in choosing school. But I didn't have that. I didn't even know that was a thing. And it's, I, like I said, I just feel like I, I got lucky for whatever reason, I figured it out. But how many kids don't? So I love this. Okay, so you're starting with high school, you're pairing them with a mentor. Somebody who's been through it before can show them the ropes because maybe they don't have access to that at home. Now give me a sense of where does AI play a role? How does the copilot work?
[09:28]
B
I feel like just one small thing to add. There is the safeguarding aspect of our platform as well. I think one of the key pillars of zero gravity is that we provide a safe space for those things to take place, which can't always happen on unregulated platforms and stuff like that. So we're dealing with kids who are of the age of 16 upwards. That is like a key pillar to what we're doing and what we've tried to bring into our Career Copilot feature as well. So in an essence, the Career Copilot feature is something that knows you. So it knows data about what you've done on platform. It understands what your career goals are, whether they've changed recently or not. It understands what kind of interactions you've had on platform, whether you are in mentoring, whether you're more leaning towards masterclasses or whether you need like that networking help, in which case it would prompt you to go onto community and help you engage with other like minded people in the community. I think with Career Copilot, the main differentiation here is, as Elliot mentioned, that we're trying to build an orchestrator, not an automation tool. So in that sense, if you imagine we've got all the tools on platform, so we know that if people engage or our students engage with mentoring, those are more likely to get their first choice university compared to the UCAS standards. So we knew that, we understood what the outcomes were. It was just about trying to get AI to orchestrate that and put them into the right directions. And I think the key thing throughout building, building, it was like, how do we provide a service, an interaction layer in which the user can generate and keep momentum on platform?
[11:17]
A
What I love about this is you had a model that was already working so you know that your mentors are helping. When you first started building your copilot, were you trying to augment what your mentors were doing? Were you trying to scale what your mentors were Doing, like, what was your initial seed of. If we build a copilot, it'll do what?
[11:35]
B
Well, we had ideas of completely automating, mentoring, having an LLM with a live kit, synthetic person avatar, whatever you want to call it. But yeah, I think we had quite some wild ideas, didn't we, Elliot? And I think one of the keys to when we began was trying to bring that back down to earth and trying to understand what the users actually need as opposed to what's cool. Yeah. As we've mentioned throughout the chat, it's more discovery, giving users the tools and ability to develop that momentum themselves.
[12:11]
A
I can imagine with a copilot, especially if the goal of the copilot is to simulate what a mentor is doing, that's a big footprint. So tell me about the earliest days of the copilot. Like, how did you decide where to start? How did you get feedback? Was there an early prototype? Just what was the beginning of this?
[12:29]
C
I think was a bit of an analysis paralysis, especially from my side, in terms of where the market was at the very beginning of this, because we, we've been. We've had our finger on the pulse of this since you could possibly do it. I think we've had a. We've had a bit of a beta around a Kanaka, the very early AI coach when ChatGPT APIs first came out. And we're very quick to work on that and work through that. And I think we. We intentionally did that in a restrained way because we've always been very nervous, as Dan's mentioned, around the safeguarding element. Overly nervous. Absolutely overly nervous. And I'd like to keep it that way, to be honest, but it's. It's something that we've always been very excited about, but it's like, how do we make sure that we can scale it and use, like, the best tools possible for it? And I think when we came to scoping out a copilot and finding out that this was a problem, I think essentially because the data was telling us that people getting stored between, let's say between action, different tools. They do certain tools. Let's say, for example, someone said, I know, but I should be doing more, but I don't know what to be doing specifically in terms of, like, interview feedback would be, you're saying, like, I finished XYZ course, but now what they know what to do. They have a really good conversation with their mentor about a particular opportunity, but then never applied to that opportunity. So it was like we were trying to try to break down the barriers of, like, different elements. And what we did find is that it was the same problem that we were focused on at the beginning is that these students are very driven, but they are. I do think there was that element of a lot. I need to keep this momentum going. I need to quite understand, like, why I'm fitting into this space. That mentoring relationship was for them, like, a very, like, positive, like, experience. But it's. Why would I go on and take that into a master class? Why would I go and engage the community beyond this? Because for them, I think it's quite a big step to go beyond that initial place.
[14:16]
A
I think the thing that resonates with me about what you're saying is that you can have a mentor telling you this is the next step, but if you don't in your life see people taking that step, there's a lot of imposter syndrome, a lot of doubt. You might even have family members telling you not to take that step. Like, I think this is for people that haven't experienced this growing up. I think it's a little bit hard to relate to it. But, like, I went to high school with kids where they were told, don't go to college. You're not good enough for college. And so I think that's. I can definitely relate to, like, they could have a great conversation with their mentor, and they're like, yeah, okay, I know what to do. And then they don't do it because there's a lot of competing forces.
[14:56]
C
Yeah, for sure. It's one of those ones where, like, even when the. You're working towards something, as you say, you'll just be, like, constantly pushed back. It's even teachers as well, because you say teacher, sorry. Like, people. They'd be in school and people would want to apply for something. And a lot of, like, state schools, I've had family members that have gone kind of state schools and been very talented enough to apply for opportunities. And they literally get to the point of that before the exams have been told, just rein it in a little bit. Don't set your standards too high. So it's. That can definitely, like, resonate with that in terms of how you're feeling. And I think, Theresa, the thing that we're focused on the start is that we were. We did approach different, like, different ways around this. Like progress trackers, like nudges, or like, recommended journeys, for example. It could be. It could have been different routes towards this, like, elements of gamification. But we did realize quite soon after conversations, we speak to students every week. And there's something within zero gravity, like everyone has to be a mentor on the platform, which is a really nice tradition where we sit down and we mentor students and it's a fantastic way from my perspective to have a regular conversations, but also for everyone else's just to test the platform and also make sure that just hear these members stories. It's so easy within the I think in our very like data hyper focused and I guess more like kind of technical space to get lost on the data and forget about these amazing stories that every individual member that can get goes through our system and then. Because at their end is like incredible. I think that's something that was another reason we went down the AR route is because every story wasn't linear, everything was unique, every ambition was slightly different. And it's this like context and we're like, there's a lot of sensitive stuff obviously we will not share with the LLM and stuff that really makes them. But when you're looking at where they are, where they want to go and the kind of that additional concept, piecing it together is very different per student or kind of where they're coming from, what school and what destination. That's something that was really exciting at the start. But that's also when going back to Dan's point, we go like, wow, what's the coolest thing we could do here? Like how far do we push this? Because we've got super talented guys like Dan and we've got a great kind of squad of engineers who are very passionate about using AI. We had to be very careful not to get carried away by the hype.
[17:10]
A
Yeah, for sure. So tell me a little bit about what was that starting point? What was the. There's a phrase that's been coming up on our podcast of what's the first bite of the apple.
[17:21]
C
I'll embarrass myself by saying like how I started it and then lead through to Dan who grounded me in the process. I think in my naivety at the time. Absolutely. And just looking through like the different options that we had as I look I went down the Microsoft Azure route where they had this very interesting setup to basically host like vector databases like RAG architecture and the different LLMs. It was really interesting to see the different elements we could use for this. And that's where I was first introduced to all of this stuff. I was like, this must be the way to do it. This must be the best approach to building LLM for as you say, like a problem space. As like, vast as our size. And I definitely just got a little bit trapped in that mindset of, look, this is like the big picture stuff that we need to do. This is the deep, like, more deep, technical things that we need to do to build this. And had to kind of rein it in a little bit, take a little step back. So it was a really good. I recommend any pm, anyone building AI LLM, like, anything to go down that route and go down a complex route and then work backwards. Because that is, for me is like a. Have been a really good journey. And I feel like now looking at what we're building, like, this is fantastic. This is what could be ahead of us. It's like just doing a little bit of extra discovery on that. That further route. So I had to go down this route, then talk to Dan, come to the realization that we might not need all this, like, kind of fancy stuff at the start because we've got the data, but it's not that complex what we're doing right now. We structure it in a correct way. We can do this in an interesting space. And it was very similar to prototyping as well in the sense that Claude code. Lovable, pure, like, addiction. From my side, I absolutely love that stuff. And I think, if anything, it was just like I had to bring myself in again because it was like the amount of things that you can do and how the. The opportunities and the speed that you can do everything, you have to be very careful to not back yourself into a corner. And I find that to be a very big trap with, like, lovable prototyping at the moment and, like, the ease of chord code and why it's super important to have these regular conversations with not only your members and your user base to stay grounded, but also, like, engineers and just be like, look, this is what I'm thinking. This is where I'm going. It's so easy to sit down for an hour, end up in a certain space, and need to be taken somewhere else because it's just moved so quick and it's so effortless now.
[19:47]
A
Yeah, I think you're touching on something that is. It's both really fun about where the world is and how fast it's moving, but it also can be pretty dangerous. I think we've had a lot of episodes where teams acknowledge they started with too much. Right. We had one team talk about they jumped right to an embeddings database and different rag strategies, and then it ended up that keyword search actually worked better. And it's hard Right. Because this technology is fun and there's a lot to discover and to learn. And I think, Elliot, your story of just being really curious and jumping in is a lot of where actually the name of this podcast came from. Marty Kagan talked about this idea of you want your engineers involved in discovery because they know what's just now possible. Right. And that phrase, like, just resonated with me so much. And what really resonated with me was it's not just engineers who need to know what's just now possible. Like, good product managers, especially ones that work in technical spaces, also need to know what's just now possible. But then we have to be careful about not going too big, not going too complex. And any engineer knows the risk of over engineering before you need it. So, Dan, let's jump to your take. So Elliot's super excited about AI, and you have a grand vision of what you can do. What was your reaction?
[21:08]
B
Yeah, I think inherently I'm a lazy and simple guy, so I wanted my job to be easy. No, I'm joking. Ultimately, I just wanted to see what we could do just in terms of context management and tool calls. I didn't think. I know RAG is great. I think it's the solution to the inevitable pitfall that the LLMs currently have, which is training on old data and extending its knowledge. But in our case, I felt like we had the power in the data that we already had, and it was more just about giving the LLM the tools that it needed to keep coherent conversations going about, for example, a job opportunity and then giving it enough context about the user so that it could actually guide and suggest what the next steps could be for them.
[21:59]
A
Yeah. So, Dan, based on what you just described, it sounds like even your earliest prototype maybe was already agentic. Is that true?
[22:06]
B
Yeah, for sure.
[22:07]
A
Okay.
[22:07]
B
Yeah.
[22:07]
A
So you started from the beginning of looking at it as tools and context, which is great.
[22:12]
B
Yeah.
[22:12]
A
Tell me, like, what did that earliest. I don't even want to say. V1. Maybe it was just a prototype. What did that look like?
[22:19]
B
So it was a button on a card on a job. So we've got what I would class as like a jobs board, and we've got jobs cards. On the jobs board, it started as an entry point on the job card. So it would start off. We played around with what was best here, but it would. At the time we decided we would start off with a structured output. So we had to build the tools firstly to get the job context and the user context into the LLM. First, but then it was just about keeping it on track. It was just about giving it the right tools, the right logic to decide when to give these tools. So a good example is for example on the jobs board, if a job hasn't been updated or job details have not been updated since the last conversation, there's no need for the, for the LLM to go and grab that tool. So we were actually able to do a lot in terms of the context management side just within the system itself and not go really convoluted with any rag or anything like that because that would have been a massive undertaking for such a small team. It was just me and two other devs on this project and we had a three month timeline on it.
[23:34]
A
So it sounds, help me envision like what was the interface for this? What did your students experience?
[23:41]
B
So they would click on their job cards. I've kind of chat bot modal would pop up, it would generalize, it would generate a summary on the user suitability to the job. So we use a structured app for this because we felt being able to structure for example an overview, analysis, match analysis and then key strengths and weaknesses of the user's career profile, for example, is like better grounding for the rest of the conversation. If we were able to do that. Yeah. So that was one of the first things we did. So the actual first interaction is very much down to the LLM. The LLM will generate something like basically an overview of your suitability of the job where you could improve on your career profile. Here are some mentors that you could go and talk to. Here's some learning content, master classes. So that was the first iteration. We didn't even have a back and forth really. We were planning on actually going for just this one shot solution where it would hide on a card and you could click it and that job summary would pop up to them.
[24:46]
A
So this is you have a student who's looking at a job and you're taking that job list and comparing it to their background. Is this like they're a high school student and this is their desired profession, like after college?
[24:59]
B
Yes.
[24:59]
A
And you're helping them with what do they need to do to get there?
[25:02]
B
Yeah, exactly. I think one of the biggest things we've seen from the past four years is that we've got talented people. It's more about getting them to put in good applications. And that was like a bit that was missing. So we quickly built a career profile and platform because we felt that would give us more data, more grounding, more Ability to just more avenues in which we could use that data and that tool to attract partners or allow or generate momentum within the user's journey. And then from there, that was the key piece of information we wanted to take into the LLM to see what we could do with it. So a lot of the job was getting that data in a document format so it could easily be pulled and ingested by the LLM when it needed, so that it could just. We didn't have it on the back and forth chat, but once we realized we've got the basis for a back and forth chat, we quickly just build that in and then put all the safeguards around it because we've had experience doing that before, basically.
[26:11]
A
Okay, let's get into like just this beginning piece because I already have some questions, especially because you said there's no rag. So it says like you have a job listing or a career listing, like something that a student might be aspiring to. You've helped them build out their own profile and your first kind of prototype was let's give you feedback on how well you match. But it also sounds like you're telling them this is what you might need to do, you might need to take these classes, you might need to. Who knows what else. Where are those recommendations coming from? That seems like you have to have that data to know what path the student should be on to be able to get that type of job.
[26:52]
B
Yeah. So we do have some gold standards, but as Elliot mentioned, every journey is slightly different. So what we try and do with the learning and masterclass recommendations and the career mentor recommendations, we already have that data in document formats and we've been running recommendations just based on a search kick algo, just to score and match people based on those two documents. So we just piggyback off that at the moment. But we are wanting to move into RAG in which we are able to embed those documents and then have full on.
[27:33]
A
I feel like we have a terminology difference maybe. So you are using search, you're just not using like embeddings, databases.
[27:41]
B
Yeah, we decided quite early on that we wouldn't go down that route.
[27:45]
A
Okay. I was trying to figure out how there was no search step. But there is a search step.
[27:50]
B
There is a search step, it's just not embeddings.
[27:52]
A
Yeah, Okay, I gotcha. Okay. I would argue search is rag. You're searching and adding things to the.
[27:58]
B
Yeah, exactly.
[27:58]
C
Yeah.
[27:59]
B
Okay.
[28:00]
A
Okay. So you do. Your very first prototype was we have a job, we have a user's profile, we're looking at matching and then based on those, the gaps maybe you're searching for. What do we recommend to this student?
[28:13]
B
Yeah, so that would include information about recent updates on the career profile, their recent job interactions. So what kind of categories of jobs have these people been looking at? And also just general interaction data about what areas of the platform they've interacted and not interacted with.
[28:31]
A
Okay.
[28:32]
B
Yeah.
[28:33]
A
And how did that first prototype go? Like, how did you. Did kids interact with it? I know you said there was no back and forth, but I can imagine is trust an issue? What do you mean? I got to take that class. I don't want to take that class. Tell me, what was the reaction here?
[28:48]
C
Yeah, sure. So it's an initial conversation, I think again, I think there was a. There was an assumption from. A mis assumption from my side that we wanted to make this relatively subtle in platform. I've learned a lot from people building AI tooling at the time that it needed to be not like an additional tool, but more of an enhancement of existing tools. Let's say it lives on a certain part of your design and it doesn't really get in the way, but it's a nice addition and that's something. Again, I think going back to my point around, we were being probably overly careful around this saying, look, let's just keep it in here. Let's not make it too in your face. Our interaction and during the testing, a lot of the kind of. We went into testing, really excited. Look, we know what it's doing. Gone through what Dan's just said, let's just have these conversations with students. And immediately just wasn't sensing that kind of sense of excitement back. And it was kind of like, oh, yeah, this is cool. Thanks for. Appreciate your kind of recommendations based on this. There was that kind of excitement around, like us like having the context around them and their engagement. And that was, that was the kind of the big moment for them. But there was no, there was never a, oh, this is an AI doing really interesting things for me here. A moment. It was just like, oh, that's nice. Cool. Zero gravity. Have that. They've got a cool feature. So came away from those discussion thinking, we haven't really had that moment, that co pilot moment that we've initially set out to solve. And through further conversations, it came more and more apparent that we needed to have more of a moment, more of a kind of like an interaction, more of wow, that is that. I understand one way you've retrieved that information from. I feel empowered by your Analysis of this based on what I've done. And I'd love to dig deeper. I'd love to like just ask more questions about this and discover a bit more about myself. And that's where we then came to, I think is close to the final version of it.
[30:38]
A
After those discussions, I can imagine, like from what you're describing, it sounds like maybe they didn't really realize it was personalized to them and that like, you might need a little bit more interaction to like express your objections and let the LLM continue to tailor it to you. Is that a fair characterization?
[30:58]
C
Absolutely. I think we probably hid the LLM magic a little bit too much by trying to make it smart. So, yeah, that's definitely something that we found out. Because I think the one thing I was also learning from having these conversations is that students use projects in like Claude interactivity. I'm a big fan of projects and. But they don't really want to be the ones managing the knowledge there. So in a way, if you think of our copilot as a bit like a kind of like a project on top of our. Within our platform, well, we can deal with the knowledge stuff. Like you can like, my vision for this in the future is that they can potentially interact with that at some point, maybe like memory management, but take that element away from the user and it's just really nice. They're feeling the power of projects without having to manage it themselves and then having things out of date, as Dan's mentioned. So that's what kind of what we were leaning towards. I think we didn't quite. We sacrificed a little bit of the chat interaction by going too far the other way.
[31:59]
A
Yeah. So did you like, was your next step. You just opened up the ability to chat?
[32:05]
C
It's not straight away. Again, we were. We did think this is the next best step, but we explored a lot different things. Look, do we go. Voice input outputs were really picking up. They weren't quite there. I think they are there now. I do think that we're seeing a lot of really interesting stuff in this space at the moment, but not quite there when we were doing a lot of the discovery work around this. But it's just like one. What kind of input output format do we want to go with? Do students actually want this? We did have, as I said, very early chat function on the platform, which was popular, but not enough for us to go. Look, let's go fully down this route. Do we use. What kind of approach to this do we use? Do we use kind of Socratic back and forth. How do we embed safeguarding into this? So we were very apprehensive about going down this route initially. And to be totally honest, it's like it felt like everybody was falling into the trap of AI chatbot, like AI back and forth. So we're like, look, we need to think about this properly. We need to. We've got the tools, we've got. We've done a lot of discovery where we do understand this space now let's not jump to what everyone else is doing and create like a LLM wrapper. A lot of these kind of apps are now like ending up being. So there was a lot of kind of conversations around it. We did eventually come to conclusion after a lot of user interviews and testing there, that we're going to do text input output for now, Voice came with its own challenges and I'm still determined to explore that at some stage. But again, we were getting a little bit ahead of ourselves on the tools available in that space. And with the text, what we are doing now is we're, we're still very prompt, heavy, so still guiding them towards the initial conversation. So where Dan's kind of mentioned, you interact with that opportunity, that button, that door, which I'm calling them, that doorway still exists within the platform because it's a nice kind of like mental model going into that conversation and having that back and forth, converse, like chat and getting the gleaning results from that. But it's also like, where do you sprinkle more doors throughout the platform and more entry points? Or do you expand that into more of a modal interface where you can like mobile first, engage with it where you want to. So we are trying to guide them through different prompts in the platform because I do think there is, there is a massive risk still with having empty text box of just having a conversation. We know probably more than most people in this space that you're asking for trouble giving a school student an empty text box on your platform. We've experienced that pain and we've built safeguards around it. But yeah, I think having that guidance to the extent is still good.
[34:45]
A
So when you say you're really relying on prompts, you don't mean. Let me make sure I understand. Like in this example we were talking through of maybe a job listing you had started with. Here's a summary of how it matches your profile and what you would need to do to be on this path. I'm assuming when you say prompts, that's like the beginning of A chat now as opposed to just a static. Okay. So they're not facing an empty box. You're basically telling them something about this opportunity and then they can interact and ask questions and dig in.
[35:20]
C
Yeah, we're helping them start the conversation and then they can respond how they want to at the moment.
[35:26]
A
Yeah. And is it. I think there's a lot of fun things you could explore with asking them questions rather than waiting for them to ask questions. I'm curious if like in this job example you could just give a summary and let them ask questions, but you also could give a summary and then ask them a question. Tell me a little bit about what you're doing there and what that interaction is like.
[35:48]
C
So in terms of like where we're like talking to them and like how we actually kind of start the conversations, it's. Yeah, so we've got that summary and then what we will do is we have quite a lot of control over wear. Let's say we want to have a follow up. We have explored like that kind of follow up system message. Do we want to add more kind of guidance around that initial chat? One thing that we have like explored is like what happens if, let's say someone has done something since receiving that initial interaction. Do we want the AI to then be like, I noticed that you updated your career profile, which essentially is like an in platform CV builder, essentially. It's like you want it to feel like it is there with you, engaging with you and encouraging in a little way. And I think one of the names that we did have with this originally was like kind of career coach, for example, because we want to be able to feel that back and forth attraction. I think we have that first starter conversation, we want to see better follow ups. But the follow ups have been a little bit challenging to be honest, because they will have an element of more randomness. I think LM is very good at having that initial response from a student and following the conversation through that way. What we've done is that we've built this. I'm sure Dan can go into it a little bit more afterwards. We build this very like from an admin layer on our side. We want to be able to control most parts of it, the most parts that we can possibly do. So in terms of if we wanted to switch the model tomorrow, we'd want to do that and be able to know exactly how that impacts the different outputs and things like that and the different system messages as well. So we have a different system message for responses and how that Kind of like people that engage, go back and forth that. But when we try the follow ups, the original conversations, those starter prompts, we found that it was getting a little bit confused in terms of the context. So it has been a bit of a tricky one in terms of like how do you navigate conversations? When do you reach out to the student? Does it come up as like a nudge? I don't know. It's something that we have been looking at.
[37:45]
A
Yeah, I want to get into all of this because I think I have some. I'm very curious about how you're managing context, but give me. Before we get into the technical bits, I want to fast forward to where you are today. You call it a career copilot. So I'm assuming it's evolved quite a bit than just this. Here's this job summary. Give me the high level of like, how is a student interacting with the career pilot today? And then we'll dive into the technical bits.
[38:12]
C
So today what we're finding is that it is tightening the kind of the orchestration across our different tools. So like where we do have that kind of original. I don't want to call it like a kind of summary from like an opportunity and allowing them to understand the opportunity a lot more. We're finding that people are doing the work they need to do before applying a lot better than they were before. Whereas before, let's say students would land on an opportunity page and spray the Apply now button and just click it, send off that first application that they probably used by working with ChatGPT anyway, found like a lot of students were doing that unsurprisingly in the first instance. So what we wanted to do is it was quite risky but the same time we wanted to make sure that we were doing it in the very controlled manner as a result because we did want to add certain steps before they got to that one like cta because that's an important one for us. We don't want to block them to do that because that would be a disaster in terms of like our partners metrics. But we did see that people were going away from it and having those conversations and editing their career profile and adding certain skills that are relevant skills to those roles, asking more questions in like the community, which is something that we also wanted to drive as a result of this. Asking relevant, having relevant conversations with their mentors as well. I think this for me is the one part of this whole thing that excites me the most is taking that knowledge from the co pilot LLM and having a productive conversation with a human being. I think there's something really interesting like having that in between. Whereas before we wanted to go down there. Oh, let's just, let's create like an actual AI mentor on the platform. Having engaging with this like co pilot to be more like, oh, thanks, I'll take this from you now. Having short interactions, back and forth interactions and engaging with the platform as a result. Because another one of the risks was what would happen if they just started having long back and forth chats with this feature, just ignoring everything else we had to offer. So I've been really happy with how people have dipped in and out of it so far.
[40:16]
A
So it sounds like it's still embedded throughout your site. It's not that they have this wide open chat interface. It's always in the context of they're doing something on your site and there's this pilot that's like helping them through the action they're working on.
[40:30]
C
That's correct.
[40:31]
A
Okay, now let's get into the technical bits. Dan, tell me a little bit about what's happening under the hood. What does the copilot look like today?
[40:40]
B
Yeah, so copilot at the moment has a wide variety of tools that has access to. A lot of them are just data fetching tools. So they just go and call either our search kick or the database to get data to drive to formulate that context. I think one of the big areas we decided to tackle first off was that context management. We didn't want to get to a place where when we did open that back and forth chat, we weren't in a place where that context was getting managed and the token counts would just get blown out of proportion or just compound as the chats got longer. So we spent a long time being able to manage our contacts. So for example, removing tool calls if we think they're irrelevant now for the conversation. So if, for example, throughout the chat we had five tool calls to the career profile, you might not need to put all that into the next response
[41:44]
A
in terms of compacting the current conversation.
[41:47]
B
Exactly. Yeah. And we are not reinventing the wheel here. Loads of things that we are doing. You'll see a lot of the top tools doing so, for example, summarizing historic messages so that it just comes compact and flattened so you're not exploding the token counts then removing irrelevant tool calls if we don't. For example. Yeah, as I said, another one being the job descriptions like if they haven't changed or if it's got context of an old Job description, like we don't really need that to be there.
[42:21]
A
I want to give a little bit of context for listeners. Right. Like when we chat with ChatGPT, like in the web, I think we have some awareness of a context window. Right. Every LLM has a fixed amount of space at which it can take in one conversation. I think what maybe some people don't recognize is when you have a chat interface, you're actually sending the whole history in every next request. Right? So the LLM is still just getting a message and responding, but as the chat interface builder, you're responsible for maintaining that history and sending it to the LLM. And so what you're suggesting is on every turn you can kind of mess with the history to compact it, to make it, to manage the size of
[43:10]
B
the context 100% and the coherency of it. Because sometimes like one of the first things we did was enable, because we're using OpenAI. So we were on the platform, we started a new project, we enabled the logs so that we could see specifically what was. We had that observability in house. But for Elliot, for example, like he could see that all in the logs. So within the logs you can see in this reply, this is all the context that was given to it. And then you're like, oh, like why is that in there? Do I really need that in there? Do I need that much going into it just to respond to a quick and easy answer? Yeah. The other interesting thing maybe that people don't realize is that we like, picking the right model for the job is super important. Like we, for example, the structured output, that initial job analysis that we run, we actually run on GPT5 nano, I think it is, and then on our replies because users expect a bit more of a quick back and forth where we're on a lower reasoning model for that. So yeah, behind the scenes that's basically what's going on. Again with tool calls, there's nothing special going on. It is literally just application logic to be like, oh, this has not been updated, don't fucking take it in. So yeah, it's weirdly one of the interesting things is that when we were building it, we were like, we're coming across these problems that we think it's so hard to solve, but they've been solved already and they're not really that complicated.
[44:44]
A
Yeah, I think the hard part is just getting over the newness. I've never done this before. Okay, so how do I jump in? Okay, I want to go back to what we were talking about with the context window stuff, because I think this is a skill people really have to learn and like in long conversations you have to manage really well. So I want to double click a little bit on your removing tool tool calls. So I want to give an example of this. You mentioned a lot of your tools are searches. I can imagine with searching you're using pretty standard search metrics to determine did it find the right thing. With search we always return more than we need. And so I'm assuming that's an example of like on the next turn you don't have to include all the results that you didn't end up using, you're just including the result that ended up being relevant for the conversation.
[45:38]
B
Yeah, exactly. I think a good example again, I bring back to the career profile the CV we have on platform. There's a lot of information on that cv, so. So we have a hash representation of that data. It's just nested JSON and at the beginning we actually just allowed the tool call to fetch that entire document and then just feed it straight into the LLM which obviously as conversations got longer that bloats the token usage because these documents can be quite large depending on the user. So we actually broke it down to have the tool call do a comparison of the previous data and the current data and then literally just give it the changes made rather than the entire document. So it doesn't need to go and look at two documents and think, oh, what the fuck's the difference? We just presented that so that when you look at it as a conversation, it's just more coherent. We're not applying any scientific reasoning to this at all or anything. It's just, does it make sense?
[46:48]
A
I love this example for a very specific reason. I've been writing a lot about CLAUDE code and trying to get product people to use Claude code personally. And what's motivating this is I'm learning as I use AI myself in my day to day productivity, it's actually teaching me what I need to know to build good AI products. And I think what you just described is an example of this. If I'm in my day to day life using an MCP server and the tools aren't well designed, it blows up the context window and you start to learn like what is good tool design.
[47:21]
B
Yeah.
[47:22]
A
So that when you go to build your own product, you're aware like I shouldn't return this whole profile. It doesn't need the whole profile, it needs to know, did this field change? I'm going to return just did this field change. Yeah, yeah, I love that. I can't think of another technology. I'll be curious if you guys can think of one. I can't think of another technology where we can just play in our own personal productivity and build the skills we need to build production products. Maybe engineers. Engineers code is a hobby and then that helps them in their job. But a product manager isn't coding as a hobby and then it makes them better at their job. But a product manager can use Claude and then learn how to build AI products.
[48:06]
C
The only time I've ever felt the same way was going back to my ops background and just like plugging into Zapier, for example. That was the early days of being like, wow, I can do my job, like on steroids here in terms of what the opportunities are.
[48:19]
A
You know what? Zapier is a great example because it teaches you how APIs work.
[48:23]
B
Yeah.
[48:24]
A
Oh, that's. Oh, perfect. See, Elliot, you're smarter than I am. I love that. Okay, Dan, I want to go back to just managing the context window. Actually, a lot of people are starting to talk about we're going to just see our future jobs as context managers. And I think this is like one of the most important skills we've talked about this example of tool calls and what tools return and removing the noise from the conversation moving forward. I can imagine a big part of your challenge too is how do we represent the user? You mentioned you're tracking all their behaviors on the product. Like, how are you representing that and knowing what's relevant for the current conversation. Tell me a little bit about content context management for what you're pulling in for what's relevant.
[49:12]
B
Yeah, we at the moment we're doing recent history. I'm not going to lie. We're not bringing. Some people have been on platform for about three years. Yeah, maybe longer than that. So we're not. Definitely not summarizing their whole freaking journey with us and trying to feed that into the LLM. We. We currently just have. It's the same. We generate a document with key information. Do they have a mentor, for example, how many times? What are their last 10 job interactions? What are the job titles? What are the categories they've been in? What masterclasses have they looked at recently? We. It's also important to say if they haven't, if they're not engaging with that part of the platform, we want the LLM to know that as well. So we, we don't just give information where it actually has the information. We will actually say no, like zero. Or we won't give Nils because we don't like giving NILS to, to the LLMs, but we'll give it like some default text or something as opposed to just get giving like an incomplete like document data. So we, I think we go back as far as three weeks, we try and collate three weeks worth. And again like if the conversation has been going on for longer than the week, we will get rid of hide that previous tool call and let and give the LLM the ability to call that tool on the next run. So it's. And that. That's just gated by application code. Like it's nothing that the LLM is doing. We are simply going, oh, we think you need this tool now. Here it is.
[50:52]
A
Ah, so you're exposing whether the tool is available or not.
[50:55]
B
Exactly. Okay, yeah. So that was another technique that we've tried to use throughout and we think it works. Yeah, at scale that would probably work to bring your general token usage down, especially on longer conversations.
[51:12]
A
And then it sounds like you have a primary Orchestrator agent. Is it truly a loop and it's just deciding what tools to call or is it more? I know a lot of people are moving to pipelines where maybe a turn is agentic, but getting the loop to work and be useful has been really hard for people.
[51:31]
B
No, we are just on a loop. So the Orchestrator, it goes through a controller. The Orchestrator's the same, but it's got that logic to. To basically decide. It's got our system logic to decide what tools it actually has available and what context it will have in that next run. But yeah, we don't do anything really on top of that, trying to think whether we. No, like safeguarding is all within the same thing on the same system.
[51:58]
A
Right. I want to get into safeguarding a little bit. So tell me, is that a tool? What does that look like in your system?
[52:05]
B
So we go hard on just moderating inputs and responses, but we also have an external partner that we have a contract with called Unity Unitary. Sorry, is that right? Yeah, I think they're a data labeling company, but they produced an API that blew out. Like it's a moderations endpoint, but they can do images, they can do videos. So we actually use it on other areas of our platform, for example community. So we've decided to actually run all our message history into unitary on a cycle, just so we're not relying on just the moderations endpoint itself. Because some of the data that comes out is not. I Would say maybe not as comprehensive as what we would like it to be. For example, unitary. Really dive into the categories. So, yeah, got this two pronged approach of making sure we're doing our due diligence in moderating the input and the output, but also having an external provider that we can run all our messages on.
[53:11]
A
Okay, so let me make sure I understand. It sounds like for every turn, both for the user and for the agent, you're sending it through a moderator sort of filter step.
[53:22]
B
Yeah, we are.
[53:23]
A
And is that. Are you using this third party for that or are you doing that yourself?
[53:30]
B
We pipe it through. So we rely on the moderations endpoint for that initial user experience, but then we rely on unitary outside of that initial orchestration just to keep us safe and make us happy.
[53:45]
A
I got you. Yeah, yeah, Okay. I guess what I'm curious about is with chat interfaces, it's already hard enough to keep the latency low.
[53:53]
B
Yeah.
[53:53]
A
What does this moderation step add? Like, how are you managing just the perception of latency, maybe.
[54:00]
B
Yeah, yeah. So weirdly, the moderations really doesn't. Moderations weirdly is really quick, especially if it's just text. We're finding that the tool calls seems to add the most latency. Structured outputs definitely do as well, but also just generally the model. But again, I think this comes back to it being maybe a UX problem, which has been solved already by showing the thinking. When these high reasoning models came out, it was simple. You couldn't do anything about that. They needed to think for that long. When ChatGPT brought out the thinking steps and all that stuff, we've done the same thing. So we have it on a stream and then if a tool call comes in, then we're like, oh, we're looking at this tool call and then right at the end it says, oh, it's generating the output. We've also done things where it's like the recommendations might load first at the beginning. So when it's generating that initial job analysis, which takes a bit longer because it's on a higher thinking model, there's already something for them to look at and interact with. So it's more just, yeah, trying to be clever with what you can do on a UX sense rather than anything you can do internally because you might not be able to do much internally.
[55:18]
A
This is fascinating to me because I think you're right to some degree. Right. We have patterns, we're getting used to waiting. There's a lot of good UX patterns. I think this is partly why Claude Code blew up so fast. It nailed this. Right. We all look at the funny words as they cycle through them, but I also, I can imagine with 16 year olds, they're not the most patient in terms of just sitting there waiting for a response. So I love that you're also looking at what can we load quickly to engage them and then maybe progressively load more as it returns.
[55:50]
B
Yeah. We also made the decision to bring it up on never present modal. So it's not like they can't see the rest of the site whilst they're interacting with it, especially on web views. So yeah, it does feel like it's not something that's blocking you from interacting with sites, just something that's aiding you along that journey and keeping your momentum on our platform.
[56:14]
A
Gotcha.
[56:15]
B
Yeah.
[56:15]
A
Elliot, were you gonna add something?
[56:17]
C
Just on the safeguarding front really in terms of what Dan mentioned not impacting the latency, but we are always designing with that safeguarding first in mind. That's all. Like we are originally a mentoring platform and I think critically, I think AI can never make those safeguarding decisions. Right. But it can help us surface certain things. So there's a. We will have observability of every interaction and we are like myself and a few others in, in the company are trained on safeguarding from day one. So it's like how do we make sure that we like really lean on our position as a tech platform having that observability of all engagements in tech and make sure that we, when we roll out an LLM within platform they how we have those smooth conversations with like schools and partners which we are currently having. Because I think a bit of one of our superpowers is saying, look, we are like safeguarding first, we build safeguarding first. And AI is definitely something that's adding a super contentious part to that and it absolutely should be seen that way. But I think people, if you go in with the mindset thinking, look, how do we make sure that we want, we don't put it into any decision making position and also have that observability, then it's quite a nice space to be in because is an additional safety net for members on our side who are like historically will have quite a few safeguarding flags and concerns.
[57:40]
A
I'm happy to hear that you can send it to a moderation endpoint and it's not a huge latency concern because I can imagine that could be a whole product in and of itself. And so to be able to have services available to you, I imagine helps a Ton. Tell me a little bit. Elliot, you said observability a few times there. We haven't really touched on evals. We talked about safeguarding. Tell me a little bit about what you're doing to evaluate quality.
[58:07]
C
Yeah, sure. So we try and see as much as we possibly can. At the moment, I think it's a bit of a hybrid use of OpenAI's tools available, which again, at the start we're like, let's build our own thing, let's build our own kind of version of this in our own admin layer. But again, the tools existed, they are directly tied to the output from the model you're using and at the moment as sufficient for us. I think the main thing is adding your own context to right is what failure taxonomy is important to you and your members in what you're trying to achieve. So let's say for example, when we first started this, I saw it as a really important process to like educate wider employees on a lot of this stuff as well. Because people just see like a bad output, whereas all of us here will see a bad output as, oh, okay, let's walk back, let's find out what the problem is here. What is it going to be next time? There's not like a bug we can fix. Right. It's something that we've got to really pay attention to and tag. And I've done a lot of kind of exploring and upskilling in kind of the AI safety space. Did a really good course with free course with Blue Dot Impact, which I'd always recommend to anybody and they walk you through the whole red teaming process and kind of the we could like green team process as well. And to get everyone upskilled and aware of these risks, put everyone into a room and doing that kind of internal queuing had students kind of work through this as well. And as like we had one red team with one green team were kind of acting as like that as a student would and the red team were doing the total opposite. It was like their job was to destroy this thing and prove that we should never release it. And a few of the things that like the fer of taxonomy points we built around, like hal, which is very aptly named around hallucinations. We had some that kind of like tagging like stale recommendations as well. So like Dan says something that they've potentially already done on platform or something that's not relevant. We look out for like vague responses if an AI goes off on a tangent, something that really struggled to stop it from doing was telling a story. It would just consistently tell students stories and it would at the end it'd be like it gets the end of the story and just say do you want to chat more about your application at this place? And it just talks about unicorns and like Shrek or something. So yeah, it just goes off. It was really hard to contain it in that sense. But tone as well I think is something we found really challenging. But I've been really interested in is like how do you stop it from being overly or underly encouraging for certain applications or roles? Because we, we had a case at very start where a student was applying for a job at a construction company and the AI was like trying to find. It had not that much information on them in terms of their career profile, a cv. He was trying to desperately find something to be like this student's gonna be a perfect fit. So he's noticed that he had a skill in like FIGMA skill. And it's like your FIGMA skill is perfect for like this, this construction business and like that. And it was very like manual kind of in person construction. It found a great reason for it. You know what, actually that's a really good point. Maybe all construction workers to dive into figma but just stuff like that being very like trying to clutch at straws is something that we, we struggle with. But get back to your original question. That was like that categorizing all this stuff and trying to seeing all the data in front of us and from an eval perspective. I think that tagging process is something that we found has made us kept us quite sane because again it's something that you hear a lot about evals. I've listened to a lot of your work Theresa and one of the evals and like kind of building that out from like scratch and how to do it on and your kind of your side things. But it's very overwhelming. But you just need to know what the things you're looking out for and then be able to like tag in that sense. And then we use metabase and things like that to add the visual, visualize trends and so we can really monitor outputs going forward. But yeah, it's overwhelming but can contain it when you start to piece together different parts of the puzzle.
[61:55]
A
Yeah, so it sounds like you're looking, you have good observability, you're looking at your data, you've identified kind of your top failure categories. Do you have data sets that are helping you measure those categories? Do you have code based or elements judge based evals at all. Like how do you get a handle on like you mentioned hallucinations, tone. Like how are you getting a handle on how the LLM is performing on those dimensions?
[62:22]
C
Just by spotting trends at the moment to honest, I think we're still quite early doors. Yeah, quite early doors in that kind of eval process. We are spotting trends and that is helping us fix it. To be honest, from what I'm seeing. But it's not. I'm very intrigued to know how businesses do this at like super scale because it's like yeah, okay, then you start need to. I'm very pro like having human in the loop on this stuff. But I know businesses have used another LLM to analyze certain things and I find that really interesting.
[62:50]
A
But I think you're, you're doing the most important step which is to be looking at your data and to understand the failure modes. I think where, I think once you've identified a failure mode there's three, let's see if I can get this right off the top of my head. Three primary ways to start to turn that into a metric that you can measure. Right. You can create data sets that you can use as your like almost qa. Anytime you make a change, you're looking at the data set is curated to show, to expose that error. So you can see is that error coming up again. But you also can look at can we actually measure this with code? Is this something that we can deterministically evaluate? Or you can have another LLM judge the response and see, see is that, is there evidence of that error? And so that kind of allows you to do it at scale. But I think already what you're doing is like the, it's the, it's honestly the starting point. A lot of teams skip like they jump right to an eval tool with a generic metric that's not fine tuned to their case. But I think like Elliot, you were saying you were interested in learning like how are people doing this at scale? I think they're just taking those failure categories and looking at how do we measure this. And it could be a data set, it could be code, it could be LLM as judge and then they're using that as an ongoing measurement. Yeah. Okay, tell me what's next for career copilot, what's next?
[64:24]
C
I'll start. I mean I think the first things is how do we, I think I mentioned this before, how do we actually optimize on memory? How do we optimize on these long lengthy journalism? One thing that we haven't really mentioned on this in this conversation is students can sign up as a school student and then stay with us until they're at university and applying for kind of jobs outside of that. And we're obviously very like gdpr. First we want to make sure we don't retain any information that we don't need to retain. But it's like how do we make sure that we can allow the LLM to retain that relevant information throughout that process that the member can then have control over in that interface as well. So that's something I find really interesting because you're talking about memory in a lot of products that is relatively short term, the grand scheme of things. How do you do it over six months? How do you do it over like a year? Over a year and do it in a way that the user doesn't feel like invaded and they feel like, they feel like encouraged by that because it's yeah, remembered that I did this and struggled at this and then I've got to this point. So I think that's one of the things we are looking at next. And I think, I guess with that it's more like what can be forgotten as well. What can we just remove from the whole conversation? But memory has been really interesting and something we have already started to explore and comes alongside the scaling of evals as well. As you mentioned, this is one of
[65:45]
A
the most interesting areas of AI products for me is what should an AI remember and what should it forget and how do you represent it so that the LLM can use it well? And I feel like I can't think of a single example of who's really nailed it. I can think of lots of products that are experimenting in this space and it's fun to see the different. Even the foundation labs like OpenAI is taking a very different approach than Anthropic. But this to me is like one of the edges that is most interesting to me of who's going to figure this out and what are the patterns that are going to emerge.
[66:19]
C
I agree. And I saw something recently. I can't remember who the product was, but one of their quirks was that they claim it as a quirk. I just reckon that they had to just remove it from the data from the database. But they were saying that our memory works like a human does. Eventually the LLM forgets and part of data disappears. That's quite nice in a way. But also it's like probably just because you haven't figured out a way of storing the older stuff. So it's like it fades and you have to engage with it again to bring that memory back. So yeah, it's an interesting one.
[66:49]
A
There's also like a UX component of this of like how is it not creepy?
[66:54]
B
Yeah, I'm. We use cursor every day and they've got this memory layer in but they actually, they actually allow the user to have control over that. So even though it will generate the memories, you still have the ability to go and edit and delete them, which I think is quite a good use case for them specifically. Yeah.
[67:14]
C
Flip side as well. Right. Because what can you remove from memory? But what I find is particularly creepy with a lot of these models at the moment is that I'll remember that. I didn't tell you. I didn't tell you to remember that.
[67:26]
B
So deciding what to remember is like one of the. Yeah, one of those.
[67:30]
A
Yeah. I personally much prefer anthropic approach where I control what's in memory 100% but I, I think it's a difference. Anthropic is going after business, going after knowledge workers and they probably want more control and OpenAI is going after consumers and they probably don't even know what. Want to know what memory is. And so I get the different strategies but I'm a little bit. Every time I see ChatGPT say I'm writing to memory, I'm always like what did you write to memory? Let's.
[67:59]
C
What did you remember?
[68:00]
A
Yeah, this has been amazing. Is there anything else that you were hoping to share that we didn't cover?
[68:08]
C
Not really. I think the only other thing we were looking for next was that we're looking at how do we bring more action execution into this as well. I think we want to make sure that we explore the Socratic reasoning space a little bit more from coaching students and also performing actions, bringing. I know, bringing like an MCP into this as well, but not doing it in a way that is going to invade anyone's privacy and is going to be in control of the number as well. Because that's why I think a lot of other products have got wrong. They've jumped MCP and it's just been like, wait, how did you. Why are you asking for this, etc. So that's something that we are exploring at the moment. I'm quite excited about.
[68:43]
A
Yeah, I love that. I've been thinking about this a lot because I am starting to build out like very specific AI teaching tools. But my grand vision is to build a discovery coach that then uses those tools based on where the conversation takes you. And I think this question of using an LLM to teach is like, what are the right questions to ask? How do you encourage reflection? How do you put more on the student to like engage and participate rather than give you the answer? Yeah, exactly. This has been really fun. I appreciate you taking the time. And I know especially we're at the end of your workday, so I especially appreciate you taking the time. Thank you both.
[69:28]
B
Thank you for having us.
[69:29]
C
Thank you very much Teresa. Big pleasure.
[69:31]
A
If you enjoyed this conversation, please subscribe in your favorite podcast app and give us a rating as it helps others find the show. Thanks. I appreciate it.