Summary7 min read

Podcast Summary: "Building Tendos AI: How an Agent Swarm Turns Construction Emails into Quotes"

Podcast: Just Now Possible
Host: Teresa Torres
Guests: Daniel (CPO, Tendos), Matthias (CTO, Tendos)
Date: January 15, 2026

Episode Overview

This episode dives deep into the story behind Tendos AI, an agent-swarmed platform transforming how manufacturers in the construction industry manage and respond to incoming requests—particularly by turning complex, unstructured construction emails and documents into accurate quotes. Host Teresa Torres explores with Daniel and Matthias how the Tendos team identified pressing industry inefficiencies, validated the AI solution, approached prototyping, expanded their product scope, and built a robust, multi-agent system to streamline construction workflows. The discussion covers early bets on AI, the unique challenges of parsing multi-format documents, technical architecture, iterative evaluation, and user adoption subtleties within a traditionally difficult-to-digitize domain.

Key Discussion Points & Insights

1. Understanding Inefficiency in Construction Workflows

[00:32-04:12]

Daniel: Tendos sits as a "system of action" above legacy ERP and CRM systems, targeting back-office workflows for manufacturers, where inefficiency and high volume of manual email traffic is rampant.
Tendering chain complexity: Construction projects involve multiple parties (owners, architects, planners, manufacturers, wholesalers) and are not 1-to-1 relationships but vast, branching networks. Manual alignments and slow communication create enormous delays.
- "You end up in like a huge tree of different parties involved and you have all these manual jobs...happening on the way." (Daniel, 01:09)
Teresa: Shares a personal analogy about managing a house renovation, illustrating the fragmented, multi-party coordination problem—even at small household project scale.

2. The Pain & Potential of Offers and Bids

[04:12-07:24]

A robust, detailed offer leads to customer trust and efficiency.
- "We actually didn't go with them because we're like, we're terrified that you're just going to come back and charge us more money...I can definitely see how software can play a role." (Teresa, 04:12)
Tendos AI's focus: Automate the prioritization, extraction, and structuring of key data from incoming emails and complex attachments (PDFs, Excels).
- Extraction includes everything from customer intent and urgency to technical product specs.
The primary workflow starts with incoming (sometimes unstructured) requests, categorizes and prioritizes them, extracts relevant product and partner information, and pre-populates draft offers for human review.

3. Deep Dive into How AI Powers the Product

[07:39-11:16]

Matthias: The AI must:
- Understand ambiguous human requests (which range from precise specs to vague intentions).
- Map requests to the manufacturer’s actual product portfolio, checking not only availability but also fit and feasibility (e.g., if the customer wants copper pipes, and only plastic are available).
- Document extraction via LLMs has made it easier to process previously unstructured, extensive attachments—"cutting them into pieces," extracting relevant positions from hundred-page documents.
- "Most of the critical information...is semantics. It's very hard for computers to understand before the age of LLMs." (Matthias, 07:39)
The system can automate simple offers but keeps a "human in the loop" for complex and high-stakes cases.

4. Product Origins: Prototyping and Early Customer Validation

[13:19-18:23]

Tendos was founded out of direct personal experience with construction industry inefficiencies.
The leap to AI was enabled by a believed "moment of change"—AI unlocking enterprise willingness to reimagine established workflows.
- Early efforts included "very naive" prototypes: simply feeding emails and documents to LLMs (e.g., ChatGPT) to see if output matched human intent.
- "We quickly found out that we actually can do that...that was the moment where we said, okay, we see the problem." (Matthias, 14:41)
Initial user response was so positive that users anthropomorphized the system (gave it a name, "chassis") and quickly began requesting the AI system replace existing modules in legacy software (like SAP CPQ).

5. Execution: Narrow Start, Demand-Led Expansion

[19:12-32:08]

First "bite of the apple": The team started with a single, narrow product line (specialty radiators) with a design partner company, isolating complexity and validating the solution in a contained environment.
This hyper-focus built product quality and confidence before gradual expansion:
- From simple, single-product PDF requests.
- To larger, multi-product documents.
- Eventually to parsing out relevant line items from massive, building-spanning documentation (up to nearly 2000 pages).
- "Our users started to experiment and the documents grew larger and larger, to the point that...now we need to invest more into entity extraction." (Matthias, 24:04)
Sitting alongside end users, observing real-world usage, surfaced countless adjacent inefficiencies and expanded the scope of automation.

6. Scaling to a Robust Multi-Agent System

[38:21-65:21]

Current product: Handles the full workflow from email intake (support, offer, order), auto-categorizing, prioritizing, tying to CRM/project, extracting request content, and generating draft offers/support responses for human approval.
Technical details:
- Static early pipeline steps: Email parsing, context lookup, entity extraction.
- Agentic and dynamic execution: Based on findings, a "planning" module dynamically orchestrates LLM-powered agents for contextual reasoning, product matching, and quality assurance.
- Review/critic agents: Specialized agents evaluate proposed matches, semi-autonomously "reviewing" work akin to code review, prompting corrections and reruns as needed.
- The system "knows when it doesn't know" and can pass uncertain matches back to a human to prevent costly errors.
  - "If you're looking at a product portfolio and you want to match a product to a description, then there's not a lot of value in saying, hey, how confident am I?...We'd rather say we don't know yet..." (Matthias, 39:27)
Eval & feedback: Heavy investment into step-level evaluation and tracing—each agent decision is checked, and both user feedback and production data are used to refine performance.
- Proprietary tooling was built for traceability, error analysis, and querying across evaluation results.

7. Approach to Product Expansion & Culture

[34:29-36:28]

Tendos deliberately avoids overextending, focusing tightly on segments where their agentic automation delivers highest value (e.g., avoiding road-building projects, which involve entirely different workflows/data).
- "Focus is key...solving the last 20% is really hard." (Daniel, 34:58)
They eschew scaling support teams, instead prioritizing engineering and AI-driven automation of support and new workflows.

8. The Future: More Agents, Broader Workflows, Deeper Customer Integration

[63:15-65:21]

Rapid expansion is customer-demand driven, but always tempered by strategic focus and careful, stepwise product evolution.
Next steps include: growing engineering team, more deeply integrated user experiences, more flexible and adaptable pipelines, and continued modularity and quality in the agent swarm.
- "More agents, more problems to solve...growing engineering team because more and more customer requests are coming in..." (Matthias, 63:16)
- Exploring more dynamic customization and interaction patterns, and higher-order automation without sacrificing user trust or product quality.

Notable Quotes & Memorable Moments

On the scale and challenge of construction tendering:
- "It's not a one-to-one relationship but a one-to-end problem. You end up in like a huge tree of different parties involved." (Daniel, 01:09)
On why AI is the unlock:
- "AI is the first point in time where now again enterprises are open to rethink their whole tool lens." (Daniel, 13:19)
The charm of user adoption:
- "...after two days using our platform, the users were doing an internal workshop to give a name [to] our AI system..." (Daniel, 16:39)
On agentic architecture:
- "Whatever that means for anyone, it's an agentic architecture..." (Matthias, 47:17)
Confidence and uncertainty in AI product matching:
- "...we'd rather say we don't know yet and show a choice than to say, ah, we're somewhat certain and this is what we came up with." (Matthias, 39:27)
On product and market focus:
- "Focus is key...it's easy to get to 70-80%, but...solving the last 20% is really hard." (Daniel, 34:58)
On evaluating and tracing agent chains:
- "Figuring out how to evaluate each agent in this application was quite tricky...But we see it as vital...it makes debugging so much easier..." (Matthias, 59:37)
On future direction:
- "More agents, more problems to solve...We are trying to expand further..." (Matthias, 63:16)

Timestamps for Key Segments

Intro to Tendos & Industry Challenge: 00:27–02:35
Teresa’s Construction Analogy: 02:35–04:12
Manual Offer Creation Problems: 04:12–07:24
How AI Powers Document/Email Understanding: 07:39–11:16
Product Origins & Finding Early PMF: 13:19–18:23
Narrow Initial Prototype & Product Evolution: 19:12–32:08
Scaling Complexity: From PDFs to Swarm Agents: 32:08–38:21
Architecture Deep Dive (Agentic Workflows): 38:53–54:39
Evaluation, Tracing, and Agent Review: 59:18–62:45
Future Expansion & Closing Thoughts: 63:15–65:21

Summary Takeaways

Tendos AI’s story stands out as an archetypal example of vertical AI product-building: starting with deep, founder-driven domain expertise; narrowly scoping the first use case; and using both technological advancements and close user partnership to iteratively expand. At every turn, the Tendos team balances customer pull with a rigorous insistence on product focus and technical depth, building a robust, multi-agent system capable of handling real-world, high-stakes construction workflows.

If you're building AI-driven automation for manual, high-variance domains, or seeking lessons on human-in-the-loop, agentic architectures, or evaluating AI at scale, this episode is a treasure trove.

Loading summary

Transcript104 lines

[00:04]
A
Welcome to Just Now Possible with Teresa Torres.
[00:09]
B
So Daniel, I'm CPO at Tendos, responsible for product as well as design and everything that related to our accounts and customers that we have.
[00:19]
C
My name is Matthias, I'm the CTO at Tendos. I'm responsible for everything that is technology and engineering.
[00:27]
B
Excellent.
[00:28]
A
I'm excited to learn a little bit more about Tendos. Tell me about the company.
[00:32]
C
Cool.
[00:33]
B
What is actually, what is Tendos doing is we are the systems of action for manufacturers in the construction industry. So we're focusing here really on that. We are sitting on top of the kind of legacy systems that are in place or system of records and we are focusing on all back office jobs that are happening. So everything that related to everything that white color works, this is something we're coming with our solution and we're supporting and we want to make it more efficient. We're focusing especially on the construction industry due to the fact of the high inefficiency and the high volume of traffic that's every day managed across the whole tendering chain.
[01:10]
A
Yeah, help me understand, I don't know that I have context for the tendering chain. What does that look like?
[01:16]
B
If you think about the whole kind of process, you want to build something, you have a project and you have a project owner who has subcontractors or architects, planners, everyone who's involved in setting up the plan of how to actually approach it, they are then actually creating first drafts of how it should look like. And they are then also defining what are the components that we need. This is then something that is often already aligned with manufacturers who are providing the materials, the products that are needed to actually build the building. And then you have wholesale sitting in between where along this chain with I would say 10 different parties involved. If you take a one to one, like just one line and there's already a kind of a lot of inefficiencies in terms of such a project takes 6, 8, 9, 12 month cycle time just to get the response back of these different feedback loops. And on the other side it's not a one to one relationship but a one to end problem. Because you have one project owner, you have multiple kind of parties. Then afterwards in the planning phase and also like preparing everything, then you are taught you're talking to multiple manufacturers per product. And then you also have multiple wholesalers you want to receive a kind of quote or prize on and an offer. And so it's not like a one to one relationship. You have end up in like a huge tree of different parties involved and you have all these manual jobs that. Jobs to be done that are happening on the way.
[02:35]
A
Okay. This resonates on a tiny scale for me. Like I'm going to give an analogy of what I just went through and this is going to be my context for how I navigate this conversation. We actually recently had some construction done on our house and we, we had to repipe our, do a. Redo all of our plumbing, which is a nightmare. And it's funny like we went. We talked to six different plumbing companies. There's a separate company that's the gc that the general contractor that does all the finishing work. We actually re piped in November and now we're working with the general contracting companies. They're getting bids from drywallers and painters and people to fix the subfloor. And like, it's just, it's shocking to me how much work goes into. We're talking to one company and they're talking to five other companies. And that's before you even get into materials. Right. They're just bringing in subcontractors. We've had people walk through our house and we're like telling the 17th people, here's where all like, it's just, I can see how. And this is a small project. Like this is a single family residence. Right. Like I can imagine when you're talking about like commercial buildings or whatnot, it grows in scope. So that's. I didn't realize this is where we were headed but.
[03:45]
B
And the point is that just one phase of it, like there are multiple phases of this kind of back and forth. Yeah, these alignments. And the other thing is also that what you saw with your house is just the surface in terms of. It's like this workflow of. We're actually doing it now, but already around this whole process you have the outbound sales teams of manufacturers talking to your plumber, positioning their product. And there's so much going on along this whole kind of process where you can optimize and improve so many things.
[04:13]
A
The other thing that I'll share from our experience where I can see where software could really help for the general contracting companies that we've been talking to. One of them gave us an incredibly detailed bid where we were able to like go through and be like, no, we don't want this whole room repainted. Like we just cut into a wall in the closet. Let's scare scale that back. Another GC gave us like a five line bid like drywall paint. And we're like, okay, hold on, what are you actually painting? And they couldn't answer the question. And so we actually didn't go with them because we're like, we're terrified that you're just going to come back and charge us more money when you realize. And so I can see where clearly the first company had way better software that led to a much better customer experience. So I can definitely see how software can play a role. Tell me a little bit about the product that you work on and like how AI plays a role.
[05:04]
B
So I'm coming more from a jobs to be done perspective here to just make it more understandable. When you think about the ESI sales teams, there are a lot of jobs that they have to do. Like you have one party is just managing your inbox and the request via hotline, via phone, but also the emails that are coming in. That needs to be categorized in what is actually behind and what needs to be done afterwards. Or already in that step you need to have a prioritization of how important is it to actually work further into it. It doesn't even mean that you're already completing the task. It's just how important is it to take a deeper look. And if you're then looking into the different use cases, it can be for example technical support, like a re, like a question on something on your product, but especially the whole planning or offerings. Use case on offering has been our kind of primary starting point where we said this is like the one of the biggest challenges they have from an effort perspective that you get in the request you have to open the email similar as a letter. You don't know what is behind so you have to read out what it is about and also how important is it? Because the importance of a project is on the one hand side of course the potential revenue in it, but also the due date. For example, how important is it? How important is my partner, how reliable is the partner? But also kind of other factors that are factored into how I'm prioritizing this request. After this kind of prioritization I have to transfer all that knowledge into legacy systems. Because quoting one of our customers like it needs to be in the CRM, other one others it needs to be in the CRM, otherwise it doesn't exist because the information needs to be there. So you have to imagine it's 20 different information for each request that you need to transfer. It's like a copy and paste job that you need to do afterwards. You have to this email 5 to 10 attachments sometimes where you have PDFs, Excel, certain industry files that no software can read in if you're not specialized on this kind of problem. And so this is something where we are taking out that information and we are extracting all relevant positions and context, functional descriptions, but also the relevant information. And we are then preparing an offer or quote that the person in the inbound sales team can actually then check, correct, and send out to the customers.
[07:25]
A
Gotcha. Yeah. See, this is exactly where we saw a huge difference in our vendors of like, how good was their offer and how detailed was it? So that's amazing. Okay, and then tell me a little bit about what role AI is playing in that process. At a high level, we'll obviously get into the details.
[07:39]
C
So AI plays a huge role in this process because as you can imagine, when you start remodeling or renovating your house, then most of your requests will be like, I would like to redo my piping, and I have a house that's I don't know how many square meters big. And in some cases it's super specified where you can say, okay, somebody made a list and they know exactly how long their piping is and how many bathrooms they have. But on the other hand, there are also people that say, hey, I have a room and I want windows. How large are they and how would they fit into our product portfolio? So on the one hand side, it's about understanding what the request is, what is the intent behind it, and then also figuring out what does the manufacturer working with us actually offer that would help with that. Because if you're searching for special kind of pipes and that manufacturer just doesn't have it because you just want to use copper pipes and they just have plastic pipes, then it's just not a fit. And that's somewhat on one hand side, you could say, why, hey, you can do this with structured data. Yes, you might, but your input is semantics and it's not structured. And the product data as well has some structure to it. But most of the critical information, because it should be readable for humans, is semantics. It's very hard for computers to understand before the age of LLMs, basically. And then also the whole topic around document extraction is a huge thing because back then you would train your own document extractor on large bodies of data just to get to a point where you can have good document extraction for these large documents. And now with LLMs, it's way easier to get to a point where even if the model hasn't seen the document before, it gets an understanding of it and it makes it way easier to cut it into pieces. And this is also a huge part for us because otherwise we couldn't just go through hundreds of pages and just present the positions that are relevant for our customers. And in the end what we do is we have split it down into tasks because some customers also work differently. You could, you have some products that are just way more configured where others are just standard off the line products. And to make this differentiation and also this decision, it's not trivial because you need to understand the request and then figure out, okay, what might be the correct product. And now I need to figure out, okay, is there a configuration that's legal that I can offer or is it maybe something super special that we might want to produce? And then the question is, can we actually produce it or is it just something we can't offer?
[10:28]
A
Okay, there's a lot I heard in here that I find interesting. I can see how complexity goes up very quickly. So let me first make sure I understand. So your customers are, let's just call them construction companies. Broadly. One of their customers emails them and says, I need something. And that can be in almost any form because it's a human emailing a company. And then your product is aware of their suppliers, their vendors, maybe their service providers, regulatory requirements, product catalogs, and is like going and trying to pull out all the information that might be relevant to this request. And then is it actually generating. It sounds like it is generating an offer, but then it's reviewed by a human before it goes back to the customer.
[11:17]
B
And also here we're making a difference in how critical the offer is. Like you have some, let's say low volume ones where the risk is not too high, where you can say there are only simple products inside so you can automate it more. But in general, we say from our perspective, the expert is sitting in front of the computer in terms of if it's getting too specific or when it's about configuration. There are too many, let's say, specific decisions where it makes sense that you have the human in the front making the last decision. Because also then there can be a lot of damage to the company. If there's a large project, you're talking about millions of revenue or fees that you have to pay afterwards.
[11:59]
A
Yeah, and I can also imagine what's critical about this process is not just those fees, but if the estimate's wrong, it's either the company's eating some of it or they're going back to their customer and there's a Back and forth. That isn't pleasant for anybody.
[12:12]
C
For you as the one requesting or wanting to have work done, that's the one that has to overpay because someone in that chain made a wrong estimate.
[12:22]
A
We actually went through this with our plumbing company. So our plumbing company, we didn't know we were at the beginning of the process. They came back to us three days in and part of it is they found surprises. They opened our walls and plumbing went through the furnace. Right. That's not a great thing. You don't want to find that in your house. And they had to revise their bid and it was fine. But like now that we're working with our GC and we see like how detailed their offer is, we're like oh, now we know what to look for. Okay. So I like I said I can really appreciate this back and forth and the end users part of this. Let's get a little bit into this. Feels like a very complex problem. Like an email comes in. Even understanding the request seems like it would be hard. Understanding what sources you need to pull from seems like it would be hard. Tell me about the earliest days. How did you get a sense for we can even do this. AI can help. Was there a prototype? Did you start with a smaller space?
[13:19]
B
Actually how we started is I think it's a combination of having the kind of family background with us in the founder team, like our CEO has like the family background in construction mirroring this problem and saying like there is a huge opportunity as well when looking at top line but especially also bottom line effects of such a kind of solution for that problem. So we knew that there is something in it. And then on the other side we came together from the perspective that we knew or that was a strong hypothesis. But we took it as given that AI it's the first time that enterprises are open to rethink their whole workflows. Before in the SaaS world it was always the case. Give me the return on invest case and explanation why I should take the risk to replace AP or to like parts of it or why do I have to change the process away from outlook. That was like impossible to do. And we had the strong, the strong conviction that AI is the first point in time where now again enterprises are open to rethink their whole tool lens. And that was the point in time where we said it's a great fit, we have that problem that we should dig deeper. We, we weren't sure about the solution to it, but we knew that there is something around it and we then found one design partner that we could work with and figure out how such a solution could look like.
[14:42]
C
Yeah. And maybe to add to that, because you asked, so what have the first prototypes been like? So very naive. Just putting data into large language models and seeing what the output would be and then comparing is that actually something that goes in the right direction and do we have the possibility to gear it more towards what we would expect? And we quickly, we quickly found out that we actually can do that. And that was the moment where we said, okay, we see the problem. We know that the technology might not be there 100% yet, but we're seeing the direction that it's going and that we can already provide some value. And that value will only increase over time because models will get more capable and there will be different approaches on how to do certain things that we do that we figured out back then were quite hard. And we already now see that some of the things actually gotten a lot easier than they were before.
[15:44]
A
Yeah, I love this. Here's what I like about this. Like a founder with like domain expertise to identify the problem in the first place and then looking at this hasn't been solved before because we didn't have the technology to solve it well. And so that intersection, that's why this podcast is called just now possible that intersection of. And I also like in construction we often talk about, oftentimes the most valuable companies are ones that build boring software. If you think about tech like Netflix is sexy construction, not so much. Right. But you're combining domain expertise in an industry that isn't exactly the leaders in technology. Although I've talked to several construction companies recently that are adopting AI. So clearly they have problems that AI can help solve with, solve for and then just started prototyping and saw some initial promise.
[16:40]
B
Also in the beginning we were open to take the risk that the problem is not big enough. Like we saw that there is something we didn't know how big it is. Also interesting side note is that in the beginning we got the feedback of oh, we're not sure like how many they have. So we validated on the one hand side with a sales driven approach. So we started to get into conversations. So we understood there is something in it. And on the other side, talking to the users, we realized the huge opportunity with AI that they're open to try it out. Like we, we had that we saw that risk of users being defensive about it. But actually it turned out that after two days using our platform, the users were doing an internal workshop to give a name. Our AI system, like they were calling a chassis. So we said before, we don't want to make it a person because we were convinced, like, let's keep it a bit more technical and keep it away from the users. But users themselves said we have to give the system a name because we are collaborating with it. And that was for us, like kind of a ha. Moment where we realized, okay, there is more to it. That kind of we can build on top layer by layer, on the one hand side. And also from a technical perspective, we learned more. And from a product perspective, we could iterate on the product quite fast and quickly. And then came the moment of realizing that we can actually replace legacy software by the request of first customers, saying, hey, actually, why can't we just use Tendos instead of our CPQ module? And we were like, really, like, we didn't. We didn't think about it in the first place, but it turned out like, really well. Like really, like that was like the really first days and we had such a strong pace and we. That was the moment where we realized, okay, we have to go full force on it. And that was the part where we also scaled strongly.
[18:23]
A
Yeah, this is such a great example of we all, as an industry, we talk about, you'll know market, product, market fit when you see it. This idea of like they're anthropomorphizing your product, they're asking you to chew up more of their software sort of landscape. That's great. Matthias, you said that the prototype started as like literally just putting data into chat GPT. I think what feels hard to me about your problem space is it's a lot of data. Like what this. And this is something that intrigues me in general. Like a lot of AI products over time, their footprint is really big. Did you start with this big footprint of a customer request an offer and go look at everything? What was that like? How did you. There's a phrase that's been coming up on the podcast of what was your first bite of the apple?
[19:13]
C
The first bite of the apple, we looked at a very specific subpart of a portfolio of one of the design partner that Daniel mentioned. And it was radiators, but a special kind of radiators. And we could say, okay, this is. They have a specific department for. For these. So we could scope it down that narrowly because we could just work with that department in the beginning. And that made it a lot easier because it also removed a lot of complexity because you don't have to have the rules for all of their products, but rather just one group. And one group is way easier to define and also to grasp than a whole product portfolio. As you can imagine, someone that produces pipes might also produce a toilet or a shower or a bathtub. And pipes and bathtubs, they don't mix very well if you put them into an LLM because they're two completely different things. And yeah, basically we had very little data in the beginning and a narrow case and then expanded outwards from that because figured one out. And then we also wanted more and wanted to see more and we evaluated more because we also got more understanding into how other product groups in their portfolio might work.
[20:33]
A
Okay, I love this because what's happening in my head right now is I'm visualizing like an opportunity space and choosing one teeny tiny opportunity to start with of help me figure out which radiator product is relevant to this job, which is amazing. How did you identify that as your starting point?
[20:52]
C
Basically through the CEO and his background. And he was also the one that helped us validate the results because he had the domain knowledge. And he could also say, hey, I think this is a good area to start. And then we also approached people within the company and said, hey, do you agree with that, that this might be the correct one? Or should we look into a different area because it might be less complex from a rule perspective as well?
[21:18]
A
Okay, so the founder had this use case in mind from his personal experience. Orders related to radiators is complex. Let's start here. And then that you verified that with your design partner. Yeah. Okay, so tell me about that first. You've got a design partner. You've got a really clear and specific opportunity. What did that very first product look like?
[21:43]
B
The question is here what we define as a product. Like I think before we had multiple versions of a prototype where we said right away. The kind of LLM based approach that Matthias highlighted was primarily about validating our hypothesis that this is a problem and it can be solved by the technology. So we were just doing first steps and asking, is that going into the right direction? Because I like the kind of Hunter's metaphor of are we running into the right direction or do we have to reiterate on where we're going currently? We knew that there is something that we had to figure out. Is it the right thing? And so we did that on the way that we had multiple iterations of this kind of prototype, the first real product where kind of users were actually working with it was the point where we realized Also for ourselves, we looked into do we believe that we can assist a legacy system like SAP, for example, as a cpu, or do we have to provide a web application to the users in order to give them an AI first experience that we think is the right way? And it turned out that we said, actually we have to own that interface in order to provide that value that we see in that point of time, but also for the future. And it turned out as the right decision because, yeah, as you can imagine, you can set up the integration, you can write into the systems. However, from a UX perspective, it's really important to own that as well, because then you're really empowering the users in their process. And you can start with a simple first version of a feature of a functionality along the workflow, you can more and more take away of the task of the user. And in the end you can make smart assumptions or smart suggestions to the users until the point where you say, okay, we're confident enough to automate this step.
[23:33]
A
Yeah, I love that you corrected my language of this first version. Really was still just a prototype? No, but I think that's great. I love that reframing. So I can imagine though, okay, you've narrowed the problem space. We're talking about radiators. We're looking at one type of product. But did it still. Were you already working on this whole flow of a customer request comes in. Do we understand if that request has anything to do with radiators? If it does, go pull this information. Tell me a little bit about for that prototype. What did it look like from an architecture standpoint?
[24:05]
C
So basically most of the requests that our customers receive are sent by mail and normally they contain some information within the email and then there's a PDF attached to it in the beginning, because we knew that document or entity extraction from a document is quite hard, but it's manageable if you have smaller documents and it really gets hard if you go into the hundreds of pages. So in the first version we said, hey, let's try to limit the scope to actually prove the value we want to provide. And let's not use the big PDFs, please just send in small ones at first. That will also have shorter turnaround times because it's less positions we have to process. And it was quite interesting because at some point nailed the entity extraction and also the figuring out what product should be matched to a certain position quite well for small documents. And all of a sudden our users started to experiment and the documents grew larger and larger to the Point that we said, okay, now either now we need to invest more into entity extraction, because all of a sudden the problem is not on the side of do we understand the documents, but rather how do we understand large bodies of data all of a sudden. Which was quite interesting.
[25:25]
A
That is, I love that your customers pulled you into that through their own experimenting. Help me visualize this. So an end customer is emailing your customer and they're including a PDF. What's in that PDF?
[25:41]
C
If we're talking about the large PDFs, then you can think of it like a document that describes a whole building. So there's even the color of the wall is in there, which flooring you want to have, how high are the ceilings, what windows do I want to have? Which radiator, what lamps do I want to hang? And so this, I think the largest we've seen Till now was 1800 pages and even larger.
[26:11]
B
Way larger, Way larger. So we are sometimes that we're talking about plans.
[26:15]
A
Right. People are emailing you their plans and saying, give me a bid for this part of the plan.
[26:24]
C
It's a plan in written form. It's not like we're looking at floor plans. You also experiment in that direction because there are actually customers that also come with like drawings and say, hey, this drawing in the end is a couple of products that we produce. Could you please tell me what these products are? Or we also got an email where someone took a picture of a product of one of our customers and said, hey, I need exactly this product. And of course our customers tried it. And we were like, we're not doing images yet, but we're working on it. But that was also quite interesting. But so it varies a lot. But if you're looking at the really large documents, it's just a whole construction site, basically. And then it's our job to figure out what's actually relevant from this document and then to present this. For smaller documents, it's mostly scoped down to really what can this manufacturer provide? So it's already scoped down for them. And that's also why it is way easier to look at these smaller documents than looking at the whole document.
[27:29]
A
Yeah. Okay, so the small document problem, they're sending you, like, here's our requirements for the radiator and the big document might be, here's the whole building plan. Figure out which part is the radiator. Is that right?
[27:42]
C
Exactly.
[27:42]
A
Okay, I can see how.
[27:46]
B
And I said it can go even further with it's not only PDF. PDF was the hardest challenge that we said it makes up 70% of the or 80% of the traffic. So we focus first on PDFs and solving the hard nut first or cracking the nut first. But also there are different files. It can be like huge Excel files. It can be as you described, images or stuff like that. And so we have to handle this as well, just to give a complete picture of what we do there. Because we want to support the users 100%. We cannot say we're only doing 95% because the result is if you are doing hundreds of requests every day, the 5% case happens every day. And it can be quite annoying if you have to stick or spend one hour for this 5% use case.
[28:27]
A
Okay, so if I got it right, it sounds like step one was you're going to send us your requests that have a one page PDF that tells you this is the radiator we want and we're going to try to tell you do we have it? And here's the price. And then as you got that working, your customers were like, okay, we're going to try, here's a five page document that talks about all the appliances they need and you're going to pull out the radiator and give me a price. And then eventually you got to really large documents of here's our whole site plan, pull out the radiator, give me a price. Where is that an accurate evolution?
[29:02]
C
That's correct.
[29:02]
B
And we're working on getting even further. But that's something for.
[29:05]
A
Okay. And Daniel, you mentioned that as you started working with your design partner on this, they started pushing you like, why can't you do this next step? Was that sort of like an adjacent opportunity? Give me some context for what were they trying to pull you into?
[29:20]
B
Yeah, in general, we have to think about like, we found the problem that's already helping them a lot in their day to win efficiency. But within that job to be done, to create that offer, we had to learn that there are way more opportunities to either support the offering process or to also the kind of technical support or answering questions that are lying in marketing materials or stuff like that. So they were naturally pulling us in due to the fact that it's their job and they are annoyed by these different tasks. And then they were like, hey, do you have an answer to this problem? Or here's something else that's related to it. And we were quite fastly understanding that it's a really interesting opportunity in terms of also looking at the whole workflow. Because for that team, you can go from the early stage of the process until the offer is sent out. There are many steps in between where currently it's a really manual process and we with $10 at this point can really deliver a lot of efficiency. But also for example, helping addition for additional revenue for our that like overall the point that you have so many opportunities along that workflow where we said we simply have to help the team and then grow also within the company to help other areas as well.
[30:37]
A
Okay, so I'm visualizing in my head your customer has a workflow that they go through when a request comes in for a radiator. You had started with this one piece and they were like, yeah, but we have all these other steps in the workflow. Can you help me with the other ones?
[30:52]
B
Exactly, yeah, exactly, exactly.
[30:54]
A
And so you started to expand like the steps you were automating for them while you were still in the context of this single product irradiator.
[31:03]
B
Exactly. We were starting with them like we were actually working with them on site. Like we were sitting for one week or something. During that time we were just like before. I think in just a few weeks we had to build like the very first version, putting it live because I think we were eager to see our application and live usage. Then we went on site and worked with them next to them for one week, looking at how they're actually interacting with our software, but then also looking into what are the other tasks around where they had to open Outlook, where they had to look into some other folders, where they had to open SharePoint, where they had to work with Salesforce or SAP and stuff like that. Where we were, it was eye opening for us, mapping out the process and realizing, oh my gosh, if we rethink the whole process, we can help in so many points. And that was like how we developed also the product and the kind of value proposition, how we can position, how we can position it there that we realized in this whole area there's so much we can do that we can improve over time based on customer feedback and where customers are pulling us in.
[32:09]
A
I love this. I love that you were like on site with your customer, just observing there's all these stuff steps. I imagine that week you learned more than you could have learned doing almost anything else, which that's amazing. Okay, so you've got a first customer, you've got a first use case of a product. You're starting to understand the end to end workflow. You're realizing like our product can actually handle a lot more of this workflow did you, like, what did it look like to go from like full workflow for radiators to full workflow for other products and then give me a sense of like, where are you today? Do you work with any product? Is it still select products? What did that evolution look like?
[32:52]
B
I think the evolution looked like that we really early on took all data that we can find also, like online and run through our tests where Matthias built some kind of scripts, where we looked into EVO sets and looking into how good does our system already perform for certain kind of areas and problems. So we already had a first assessment of the whole situation and the whole market. And this was also the way forward that we always put up some assumptions, we tested against it. So we made a plan for how to approach the problem. And this also helped us in the beginning that we identified certain development partners or design partners in certain areas where we said, this makes strategically sense for us, work together with how to approach this, because we don't have the answer. And this is the situation today that we have the technical groundwork and the system works for any product in general, it's now more about how good is the product quality that we can, that they can provide us. Because sometimes there is no product data at all, more or less where a lot of configuration happens or personal decision making happens as well, where we have to first work on standardizing the product data. So that's the additional workload where we try to postpone that one over other opportunities. And the other thing is that we are looking in the product also when scoping or planning our roadmap, that there is a lot of internal enablement, for example, for setting up tenants, for setting up the systems for running evos, that we make sure that every company we're working with is working well this point.
[34:30]
A
Yeah, you're touching on something that I realized, like in your earliest days must have been really critical, which is having a very clear customer segment. So like you mentioned radiators, which means, like you're not working with construction companies out of the gate that build roads because they're not anywhere in your problem space. Is that true? Do you feel like your company has this culture of we're going to start really small, we're going to target this specific type of customer and we'll just trust that we'll grow from there.
[34:59]
B
That's 100% the case like we have. I think sometimes we could discuss are we too narrow? But we always said, like, from our experience in the past that we said focus is key, Like I Think in the past we always made the mistake of being everywhere at once. And especially in times of AI where we had to really, like, it's easy to get to 70, 80%, but like solving the last 20% is really hard. And where we said we have to focus and stay focused and then grow from there instead of doing everything at once and then having a problem on the product side, but also having a problem on our organization side that we cannot cope with it. Because I think, for example, we don't have a support person. Like we are pretty engineering heavy. We are. We have account management and everything from setup, but we are convinced that it's the wrong way of building up support teams because actually this should be handled by engineers and AI support kind of processes.
[35:53]
A
Yeah, it's really clear from the story you've shared so far that it's just. It's almost like a textbook example of starting with a small opportunity, nailing it, and then using your success in that first opportunity to expand to adjacent opportunities and then expand to adjacent markets. So I really love this part of the story so far. I do want to get into. Tell me a little bit about what does the product look like today. So what types of products do you support? What types of requests does it work with? Like how is your footprint grown? And then I want to get into the technical details of how does this actually work.
[36:29]
B
So from a product perspective, today, our application is covering the workflow from everything that is coming into your inbox. It can be a support request, it can be an offer request, it can be an order. Like we're categorizing it, we are prioritizing it, we are already putting it into the right projects and assigning it to the CRM so we can take over these kind of tasks happening up front. But also that we are already creating the offer for you. So as a user you have to approve the proposal that Tendos has created for you in any case. So let it be an offer. But also for support, we don't send out an email. The user has to say, yeah, submit. That's the right answer. Please send it out. So that's the current situation that we can support this workflow. What is something we're looking into currently is more the technical planning part where it get really about calculations and stuff. You said with drawings and documents, where it's getting more kind of complex, where a lot of unclear requirements are there and you have to take assumptions. This is something we're looking into currently and from a customer perspective, we are nearly in all segments in the industry. So we are not only supporting, like you said, radiators, but we are supporting everything. When you look around, you let it be doors in the background, if you look at the ceiling, if you look at lights, everything. Like we can support all segments there. Where we are not currently it is, for example, you said it, road work. This is something where we had also partner where we worked on where like an infrastructure project. But we realized that our product is not supporting their use case good enough and it would be distractions. It would be too much of a distraction for us right now to cover it. So we aligned with them that we will follow up in a few months. Going again back into the infrastructure topic. But for now, we really have to focus on our current use cases along the workflow in our segments that we're working on.
[38:21]
A
Yeah, it sounds like my plumbing example was pretty appropriate. Okay, let's dig into this a little bit more. So let's get into the technical details. Let's say I'm a customer of your customer and I email and say, here's what I'm running into. I need a bid and I'll share like our bid. They're removing light fixtures, they're getting a painter involved, they're doing drywall. There's materials for piping. Like, it's complex. What's happening after you receive that email, Give me a sense of from an architecture standpoint, what's under the hood.
[38:54]
C
So what we first tried to do is first see, because the easiest thing to evaluate first is what's the email about? Because that's already text that we can work with. We don't have to look at any PDF, we don't have to look at any image. So we first try to identify the intent of the email. Are you asking for a manual? Are you asking for any other support request? Is it maybe just that some. Something broke and you want to replace something or is it more? Yeah, I want to renovate. And this is, I have multiple products here that I need actually to be fitted into my house. And from that point of view, then what's important for our customers is to understand, okay, how many of these products are in there, so how much value is in it for me if I start working on this, we try to give an estimate on, hey, how much revenue is this request actually? And that's then an indication for, okay, how important is this? And then we also look at what's the submission date. So how much time do I have until I need to send out an offer? This is all Data that we extract together with who's the planner, who's requesting it, what are the partners involved? Have you heard about this project before? This is all data that we extract so we can present it to our customers, because normally they would do that by hand and then put it into their CRM just to keep, to have some sort of monitoring and to be accountable for. AI actually did that. And we know what we did. And from that point on, if there's a PDF involved, we look at the PDF, we, as I said, do entity extraction, look at all of the position data and classify these. And hey, what is actually requested? Is it just one part that's relevant? Or are there multiple chapters we should look into? Or even Is it multiple PDFs we need to look into? Because it might be scattered around. And from that point on we have like multiple agents that work together that have different capabilities. And depending on the product we're looking at, we might look at, hey, what are the dimensions? Can we already find out what the product is there? Is there maybe even the product code in there? Because you just know, hey, I want exactly this, which would be the easy thing. And from that point on, it depends on the product portfolio, which route you go down. But in the end, what we try to do is make the request that a human made as understandable for a large language model as possible and then try to look at the product portfolio of our customer and find the most fitting products. And this is not just one, but we look at a broader set of products and then reason about what's the correct one. And if we aren't sure, we also, what's important for us is that if you're looking at a product portfolio and you want to match a product to a description, then there's not a lot of value in saying, hey, how confident am I? Of course, it's nice to know, hey, I'm 100% confident. Because then I'm like, yeah, nice. Okay, that must be correct. But what is 70% confidence on? I picked this one. And that led us to say, hey, we'd rather say we don't know yet and show a choice than to say, ah, we're somewhat certain and this is what we came up with. I think that this is also one of the hard parts that we needed to solve, because as we all know, large language models try to solve every problem and will give you an answer to whatever question you ask, but it's not always the correct one. But I think, yeah, we did a good job on that. And in the End, we just present this in a nice interface, have the user look through it, have them check if everything is correct, fill in the gaps where we said, okay, we just don't know yet. And then they can send it off to their customer.
[42:49]
A
Okay, so it sounds like an email comes in. The first thing you're like, the most important thing that's happening is this like entity extraction. What products are involved in this request, is that true?
[43:03]
C
Is it From a work perspective, this is one of the jobs that takes the longest because as you can imagine, the longer the PDF, the longer you're scrolling and trying to figure out which pages are actually relevant for me. So if you're looking at the jobs that our customers do, this takes up a lot of their time and provides a lot of value. In the end, having this is also the basis for figuring out what's the correct product for it. But in the end, getting to the correct product is what our customers really want out of it, because that's the automation in the end.
[43:41]
A
Okay, I can see why you thought a lot about that. It's not about the email has entities listed. It's. There's some semantic reasoning and some matching of I need to understand the need. And then based on what I understand about the need and maybe some requirements around the need, I have to go. My system has to go off and explore a product catalog. And you're trying to find a match between a need that you've uncovered and a product that could fill that need. Is that a better way to put it?
[44:10]
C
Exactly.
[44:11]
A
Okay, so let's talk about this. One thing that I can imagine that I'm hearing a lot with AI products is there's this like these data layers where like you said, you're trying to make it as easy as possible for LLM to understand. So I can imagine like you're going from human text in an email, who knows what generated the PDF, whether it was human or machine. And you got to find something meaningful. Is there. How are you representing what's in the email to the LLM before going to look at what's the product matches? I guess maybe a different way to put that is how are you parsing that email? How are you parsing that PDF? What. Unravel the magic a little bit. An email comes in, a product comes out, what's happening in between?
[45:05]
C
A lot of LLM magic happens in between. Basically we try to chunk this into different parts and try to make it digestible. So normally if you're looking at large documents, they have some structure to it emails normally aren't that long, so it's easier to understand. And from that point on, knowing which customer received that email or that request, we can already make some assumptions and look at, hey, this customer produces Windows. We don't need to. If there's a toilet in there, it's just not relevant and we don't need to look at this. And this will not provide us any context at all to fulfill that request. We just throw everything away that's not interesting in that sense. And then normally, based on how dense the information is in that description, we also look at the rest of the document to just figure out, hey, are there more information? Is there more information in the rest of the document? Because sometimes, because all of these documents are made for humans to read and we are quite good at, hey, here's a general description and here are the combinations I would like to have. A human is very good at that. Turns out large language models are not very good at that because they forget really fast. And this is part of that, where you just need to retain that context based on where you are currently and based on what's requested. I hope that unravels it a little bit.
[46:39]
A
It does. So one thing you said that's really intriguing to me is that the requester, the person who sends the original email, is giving you context to narrow the search space. So we know this request is coming from Joe Schmo. Joe Schmoe works with Windows. We're going to narrow our product search space to what we know Joe is interested in what you mentioned there's multiple agents. You mentioned a little bit about how do we give the right context to the LLM. Give me a sense of how would you describe your architecture? Map what's happening at what steps
[47:18]
C
in general, I would say whatever that means for anyone. It's an agentic architecture where we have a mix of things that just need to happen that are just being done in sequence. So we need to look at an email first. There's no use case for an LLM there that just says, okay, always look at that email. No, we can do that, it's fine, we will do that for you. But the further down you go, that pipeline, the more dynamic it has to get. Because you tried or we try our best to build up this context. But in the end, the decision making is based on data that's retrieved and data that needs to be dynamically retrieved. And then based on that, the later processing steps are defined. And we think of this as just adding capabilities to this dynamic workflow. That happens so that we can facilitate more of these use cases. And basically looking also at how complex is each of these steps and do we need to separate these out because they're getting more and more complex. So because it can only add a certain amount of capabilities to a model before it becomes overwhelmed and it just doesn't do the job very well anymore, you try to identify, okay, which part of this process is getting more and more complex and how can we carve this out in a way that it's still retains the dynamic way of how it's in use, but still staying focused on the task at hand.
[48:55]
A
Yeah. So it sounds like your early steps in this workflow or this pipeline are pretty static. They're happening. For every email we're going to go look up what do we know about this customer. Maybe we've identified some entities, we're going to pull in some information about that. And then depending on what comes out of that process, it sounds like the rest of the workflow or pipeline is very dynamic. Is there like an agent orchestrator that's looking at what comes back and then deciding, how do I construct this dynamic pipeline?
[49:25]
C
Yeah. So we use basically a planning pattern where we look at the context that we provide and then based on the rules that we've discovered, try to come up with a plan on how to proceed with this. And based on the findings that are in the workflow, we update that plan. Because you might go down the wrong road and say, hey, I identified X, Y and Z and then need to circle back because you all of a sudden realize, no, wait, we made a mistake, we are in the wrong category. Let's go back and figure out what the correct one is. And that's the gist of it, I would say.
[50:01]
A
Okay, so what you just said raises, I think, what might be a pretty critical question. You've got an agent that's exploring and trying to look for what are the right products for this request. How is it evaluating success? How does it know that it chose the wrong category and needs to try again?
[50:19]
C
That's actually quite interesting because we have a high emphasis on evaluation sets. So for all of our customers, we have evaluation sets so that we can take, we can track how good the performance is. And also, as you can imagine, the more use cases you add, the more customers you add, there's an exponentially growing problem. And if you're not on top of that, you will have very unsatisfied customers at some point. And this also enables us to tune that process quite efficiently because all we do is like we change small parts of it, run small evaluations on it, see if it performs on par or better, and then iterate further on it. And this enables us to have on the one hand side, very short feedback loop, have quite good control over it and add more capabilities as time goes on. What we also do is we look at, we sample the data that we create and then evaluate on that so that we can see actual production data. And after the fact that a user confirmed or changed anything, we take this and use that also as evaluation and also for learning on how to improve that process. So basically, if you think about the concept of the holy grail of the self learning large language model, we're trying to get to that point where we say, okay, based on human feedback, we can actually learn something and incorporate that into this dynamic flow because that flow in itself will not change. But the more we can guide it towards the right direction, the better it will get. And we can't make this up out of thin air. We could use historical data and work on that. But I think taking the input from the user and making sense of that and presenting that also in a way that makes sense to a large language model is a great approach for this because you're getting close and closer to a self learning system. Even if it's just a niche, but you're getting closer to it.
[52:22]
A
Yeah. This is one of the things I love about human in the loop. At the last step, right. You get an ultimate measure of did we get it right? I guess what I was asking was more at the like micro agent decision level. So you've got this pipeline you've pulled in a bunch of context. The agent is trying to decide what else do I need? Is it this product? Is it this product? Which product is the best fit? You had said something like it might realize it went into the wrong category and has to back up. What kind of guidance does the agent have for knowing this product is a good fit? Like how is it making those decisions? And that's long before the human gets to see it, right?
[53:05]
C
Yep. One thing is that we look at the product data and try to identify the rules and try to differentiate these products. So it's not just that we are looking at what are the parameters of the product. Now we're looking at the complete product portfolio and say, hey, how do these work? And what's. What would be something that needs to be requested that this is a correct fit and what makes this not a great fit. So basically looking at it and say, hey, if you're looking at. Looking at a hospital, you need completely different products than for residential area. And maybe you just came down that route and said, hey, this is a great, great fit for your request. And then we review that and say, hey, but did you take into account that this is a hospital? And no, I didn't. So basically how you. We try to mimic how you would interact with the feds, but we have agents that do that. Where it says, hey, did you take into account that this is a hospital? And then circle back on that information and say, oh, no, I didn't. Let's reevaluate, because then this is definitely the wrong product.
[54:16]
A
Ah, okay. A couple questions. So it sounds like when the email comes in, are you extracting. Here's a representation of what we think the need is, and the agent is looking for a product that matches those needs. Is there some help me understand, like, how does that agent evaluate a product and a product fit?
[54:39]
C
It depends on the portfolio, basically. So some of these are very highly configured where you can say, okay, we know how long it is, how high it is, or others are very well described. Where you would say, okay, if you're looking at bathroom ceramics, for example, because they are very well described because normally you have this design language, and this is also within the product description. You want to be concise in your design language in your bathroom. We've learned, and that's a very easy thing to then evaluate on. So if you say, okay, we took this bathtub and this toilet, we can just look at it and say, hey, these two don't match very well. Can we find a better match for this? Because it will just look nicer. And this is something that discovered beforehand when looking at the data of our clients.
[55:33]
A
Okay.
[55:34]
B
To add here, I think in general, like, one thing is, of course, like, quality measures, like, looking at, like, I think this is specifically an advantage that we are like, in this vertical that we have an understanding for how does it work? Like that there are certain quality measures, like earthquakes, for example, in certain areas you need to know about that or certain other things about the project. We can look up the location of a project, and it might be relevant for the context for a decision of the right product. So this is something where we have developed certain kind of criteria that are relevant. And then on the other side, in the beginning, when we had a small amount of customers, of course it was possible to do it qualitatively that we took action on it. But later on, when you think about it, like, we can't Be experts everywhere. So we need to find a way of abstracting it themselves so that we have like, that we are enabling AI or agents to put it on a more abstract level to come up with such quality measures, asking the right questions in the end, so that we are placing the right products. Because, as said in the beginning, if there are certain requirements that could add up in the end to millions of damages, so we really have to be conscious about what we are offering there. And if we are leading the user to overseeing a certain keyword or a certain requirement that some were written, that would be a problem. And that's why we are investing so heavily in this kind of quality measures there as well.
[56:57]
A
Okay, so it does sound like there is some concept of, like requirements or a spec. And then it's looking for, does the product match the spec? And then, Matthias, you said something about you have the main orchestrator agent, but there is like another agent observing and asking and saying, did you consider this element? Maybe this isn't as good of a fit. Tell me a little bit about that process.
[57:21]
C
So basically what we try to do is to have a fair evaluation on what we did without having a human looking at it, as we have a specialized agent that also we try to keep some of the data away from the other part of the application just so that it's not interfering with it, so that you really have a fair evaluation on it. And that agent then looks at what we did, looks at the data we give it and some of the rules, and tries to figure out, does this match. So the easiest thing, as I said, is consistency. And then there's a lot of other rules. And then it will give feedback on what happened. I like the analogy of a code review agent that looks at your code base and just says, hey, have you thought about this? Because that's actually not a bad best practice. And then circling back and telling the other agent, hey, this is my feedback. Do you think you can improve on this? And then hopefully it can, yeah.
[58:25]
A
Okay, so the orchestrator agent does the exploration, makes its best guess, a review agent looks at that, gives feedback, and then the orchestrator agent gets to try again.
[58:36]
C
Yeah.
[58:37]
A
At what point does the orchestra, like what? When does this stop? Like, at what point does the orchestrator agent just decide, I've addressed enough of your feedback? Or does the reviewer say, okay, we're good.
[58:51]
C
There's also more agents involved in this. So at some point some of these will just say, hey, I did what I can, because they know what steps they took. And at some point they will just say, hey, I can't just make sense of it. So maybe if you would give me more information, I could, but with that information I have, I just can't do anything for you. And that's basically the point where everything says, okay, we did our best, but we're sorry, we just don't know. Please help us.
[59:18]
A
Yeah. Okay. So you have a swarm.
[59:21]
B
Yes.
[59:22]
A
And then ultimately something gets presented to the customer and the human has the final say.
[59:28]
C
Yeah.
[59:29]
A
Okay. And we talked a little bit about evals throughout this. Is there anything else on evals that you want to share that you think is interesting?
[59:37]
C
Interesting? Yeah, I think what's interesting is what we figured out is that there's not just one eval set that you can have, of course, for customer, for use case. But figuring out how to evaluate each agent in this application was quite tricky in the beginning, especially doing that at scale. But we see it as vital to do that because it's just shows you the little changes that happen within that application and makes debugging so much easier because you actually know what chains and where should I look? Because if it's just evaluating the whole chain, you have no idea where in between it might go into the wrong direction. And then you really have a hard time debugging.
[60:26]
A
Yeah. So it sounds like you're doing evals at each stage step, not just on the whole. Obviously, your human in the loop is feedback on the whole chain, but you're also looking at each step. How do we get the most performance out of each piece?
[60:39]
C
Yeah, And I think what might also be interesting is we looked at a lot of the tracing, observability, and evaluation solutions out on the market. And currently we are still working on our own tooling because we figured that some of the things that we actually would like to have are not there yet. For some reason. We're still very open, and maybe at some point we will find the right solution. Maybe it's just us being too stubborn to pick one of these.
[61:08]
A
What are some of the things that you feel? First of all, I hear this a lot, so it's not super surprising to me. But I'm curious, what are some of the things that you feel like are missing?
[61:17]
C
One thing that helped us a lot was having the possibility to ask questions on our evals, but combining that with our data, which is quite a hard task to do in a general way because everyone has different data sources. The setup is differently, but in the end, just having a quicker or more precise evaluation of your eval results. So Looking at why, what did go wrong. Can we classify this into groups? Can we look at past data and see if, if it really changed or is it the same? Because if you're looking at thousands of entries then it gets harder and harder to figure out okay, was it always wrong and how wrong are we? Is this like completely off or is it just oh, we mixed left and right here and having something like that really helps.
[62:09]
B
Like from a UX perspective we build our own interfaces for understanding the argumentation or like the reasoning of the models because we had so many kind of interactions happening with like along the this chain that kind of with current solutions maybe changed meanwhile. But I remember Matthias when we built our own solution because it was really hard to afterwards dig deeper into the details with having so many conversations happening and going on and having the right point in time that we said it's even quicker kind of building it ourselves because existing solutions were capable of doing such thing but they were not optimized for that amount of decisions that happened along that whole kind of chain.
[62:46]
A
This I think when viewing traces and like doing your error analysis, having a custom interface is so important. Yeah, fascinating. It's funny how quickly we expect to be able to talk to our data. LLMs have really just changed the way we think about the world. Right. I need to talk to my data and that's not possible with these tools. All right. We are really cutting it close on our time window here. Let me ask you this, what's next for your. It sounds like a very good strong multi agent system. What's coming next?
[63:16]
C
More agents.
[63:17]
B
More agents.
[63:18]
C
Yeah, there's always more agents, more problems to solve. We are trying to expand further so growing engineering team because more and more customer requests are coming in and not just like the small ones, please add a button here. But rather hey, we have a use case that you could take a look at. Please help us. And we are actually looking into more on the interaction side how you can make it even easier for these employees in the construction industry to interact with our solution and to get answers even faster.
[63:56]
A
Yeah, amazing. I really love. It's very clear that you're letting your customers pull you but not in the like classic B2B. We're just going to build every feature request but more in the really trying to understand where can we be of service, how can we take on more of this work, how do we extend our pipeline, how do we handle more use cases. And I'm not surprised to see that it's working really well for you.
[64:22]
C
Yeah.
[64:22]
B
And it's I think as you described it's a lot about opportunity cost that you have to make a clear prioritization and everything is possible currently. And I think what we're currently doing is also reworking parts of the application that we are more flexible like we had like where we're starting from that you have a certain linear kind of workflow where it works. But I think similar as the tech is moving so fast and like instead of years it's now months. The same goes for ux. We are hoping currently like we see some kind of new patterns arising and we are also believing internally that we understood more of where do we have to go in the future to be more flexible eventually also being more adaptable to the need needs of our customers that not a new feature is needed but we have more flexibility but staying at the same time on a high quality bar as well as making sure that the kind of 90% case is not written into a chat interface but still being in a clear structure that a user can do that high frequent task in a fast manner.
[65:21]
A
Yeah Amazing. I really have enjoyed hearing more about your story. Thank you for spending the time with me.
[65:28]
B
Thanks Teresa.
[65:29]
C
Thank you. It was a pleasure. Thank you.
[65:32]
A
If you enjoyed this conversation, please subscribe in your favorite podcast app and give us a rating as it helps others find the show. Thanks. I appreciate it.