Transcript
A (0:00)
Welcome to Lenny's Reads, where I bring you audio versions of my newsletter about building product, driving growth and accelerating your career. Agents are so hot right now. Every other day someone's launching a new one or a new tool to manage them. I bet your team has a half dozen agent ideas on your backlog right now. None of this means you actually need to build an agent today, but it does mean that you need to understand how agents fit into your broader strategy and what the right investment looks like. Hamza, Farooq and Jayarajwani teach two of the most highly rated and well respected courses on building AI agents, agent engineering bootcamp and agentic AI for PMs. They spent over 50 hours putting this guide together. By the time you finish listening to this episode, you'll understand the three types of agents, how to decide which initiatives to prioritize, and how to avoid common pitfalls. This includes specific recommended tools and platforms and tons of real life examples. The rest of this episode was written by Hamza and Jaya and is narrated by me let's get into it over the past year, we've had the same conversation at least 30 times. An AI leader pulls up their roadmap, usually five to 10 agent initiatives, and says, help us figure out which one to build first. The list usually includes a PM assistant, a rag copilot, a customer support system, a code review agent, and a voice enabled shopping assistant. If you're listening to this, you probably have a similar list. Your team is energized. Investors are asking about it, competitors are announcing agent launches. You need to pick something and ship it. That's where most teams get stuck. The problem isn't that they lack ideas, it's that they try to prioritize fundamentally different kinds of systems as if they were the same thing. The usual approach is to reach for familiar planning tools. Teams open an impact versus Effort matrix and try to compare ideas side by side. But with AI agents that quickly falls apart. One agent might take six weeks to build, another might take six months. One agent can be assembled by a product manager using Aniden. Another requires a dedicated machine learning engineering team. One costs $500 per month to operate. Another could generate a six figure annual LLM bill. A customer support assistant and a voice enabled shopping agent may both be called agents, but they demand different architectures, different teams, different infrastructure, and different timelines. Until you recognize those differences, any attempt to compare effort or impact is essentially guesswork. Treating architecturally different products as if they're in the same category makes effective prioritization nearly impossible. Prioritization breaks not because teams are bad at planning, but because they're comparing apples, oranges and jet engines on the same spreadsheet. The missing step is hierarchy. Before you can decide which agent to build first, you need to answer a more basic question. What type of agent is each idea actually proposing? This will determine almost everything that matters for planning how complex it will be to build, what skills and infrastructure are required, how long it is likely to take, how expensive it will be to operate, and how you should measure success. In other words, categorization isn't just a technical exercise, it's the foundation for smart prioritization. This episode gives you a decision framework you can start using today with your current roadmap. We developed this framework from patterns we've seen while helping organizations turn agent ideas into real production systems. Working with enterprise teams across Fortune 500 companies such as Jack in the Box, TripAdvisor and the Home Depot, we found that grouping ideas by their underlying architecture unlocks prioritization and significantly speeds up the development and launch process. These distinctions also mirror how the broader industry is beginning to classify AI agents from automation workflows to reasoning systems and multi agent networks. Like the levels of autonomy for AI agents paper and types of AI agents by IBM. These are also the foundations of how massively popular tools like OpenClaw and Claude Code are actually architected. If you're staring at a backlog of agent ideas trying to figure out what to build first, you here's what you'll have by the end of this episode. A guide to picking the right tool or platform for your project, like when to use N8N vs Langgraph vs ADK success metrics and ROI frameworks tailored to each architectural type and warning signs that you've picked the wrong path. You'll be able to look at your backlog and know which ideas can ship in six weeks for quick roi, which need three months but will drive significant revenue growth and which are a six month bet that only makes sense with the right resourcing and expectation setting, all by first recognizing that agent is an umbrella term for very different kinds of systems. Every agent idea falls into one of three architectural categories. Category one is deterministic automation. You define the entire flow. AI handles content at specific steps. Think Nadan or Zapier workflows with LM nodes. This is where the majority of agent opportunities belong and and where most teams should start. These projects are fastest to launch and deliver measurable ROI quickly. Category two is reasoning and acting agents. AI decides what to do next using available tools think Cursor, Lovable or agents built with Langgraph, Crewai, Google, adk and similar tools. These initiatives typically come after category one when higher value problems require flexibility and dynamic decision making that workflows alone handle. Category 3 is multi agent networks. Multiple specialized agents coordinate with each other. Think enterprise systems built with ADK or Autogen. These projects are typically reserved for later stages when multiple teams must coordinate across domains and should almost never be the starting point on a roadmap. Here are some examples of agents that fit into each category to help you understand these differences. The first example agent is a customer support system that troubleshoots technical issues, accesses account history and and resolves problems without human handoff. This is a category one agent because the steps are known. First classify, then search knowledge base, then route. You just need AI to understand content and follow your flow. The second example agent is a code review agent that checks for security vulnerabilities, suggests optimizations, and updates documentation across repositories. This is a category 2 agent because you can't map every possible code change in advance. The system needs to reason which files do I need? What's the right refactoring approach here? The third example agent is a global retailer where sales, inventory and logistics systems must coordinate to fulfill customer orders across regions. This is a category 3 agent because no single agent owns the full workflow. A shopping agent confirms an order, an inventory agent checks stock across warehouses, and a logistics agent determines delivery feasibility and routing. Organizations often Try to build Category 1 problems with Category 2 frameworks over engineering solutions that add unnecessary complexity and cost less frequently, but with worse outcomes. They try to solve Category two problems with Category one tools and it breaks in production because the tool is not robust enough. Let's take a deeper dive into each category, starting with the workhorse. Category one Deterministic Automation. These are workflows where you define every step, every branch, every decision point. NLM handles natural language understanding and generation at specific nodes, but you control the flow. Think of them as intelligent flowcharts where you design the path and AI handles the content. Tools most commonly used for deterministic automation are N8N, Zapier, Make.com, openAI agent, Kit, Lindy, and Gumloop. These tools are built around explicit triggers and predefined branching logic. You define the workflow, while LLMs are used only for classification, extraction, or drafting within those boundaries. Here's how you can prioritize Category one products if your backlog includes a mix of agent ideas. Category 1 projects are almost always the smartest place to begin. These initiatives tend to be the simplest to plan and the lowest risk to execute. They're best suited to situations where the process is already well defined and the goal is to automate repetitive, high volume work. If you need quick, measurable roi, have limited AI engineering capacity, or are under pressure to deliver results in weeks rather than months, Category one projects are almost always the right starting point. Most initiatives in this category share a similar profile across certain criteria. These projects typically involve a PM and software engineer, require a workflow automation tool and basic LLM access, take two to six weeks to complete and are low cost and low complexity. That combination of fast timelines, modest resources and clear business impact is what makes Category 1 initiatives such powerful early wins. They generate near term value while building organizational confidence for more advanced efforts later. Here's how to tell what types of products fall in this category. If you can map the entire process as a flowchart with clear decision points, a product belongs in Category one. Here are some more traits of a Category one product Execution paths are finite and predictable with fewer than 15 to 20 branches. Task completion needs to happen in seconds to minutes. The value is in automating a known process, not discovering new approaches. In our experience with customers, this covers 60 to 70% of agent opportunities. Revisiting the typical list of opportunities I mentioned above, here's a great example of a Category 1 product. An AI agent that handles incoming customer emails, reads them, understands what they're asking, pulls relevant information from docs, drafts replies and routes to the team for approval. At first this sounds like it needs sophisticated reasoning, but when you map out what actually needs to happen, it's remarkably deterministic. Every step is predictable. The intelligence is in understanding the email and generating a good response, not in figuring out what to do next. This is category one. There are a ton of great examples of automation agents. I built one for Airbnb. I love Airbnb, but I hate spending long hours finding the best ones. So I built an agent that will take my exact request. For example modern apartment in Paris near train stations from March 20 to March 26 grade great for a couple and run a search. More than 10,000 users have used it and the details for building your own can be found in the written version of this episode, linked in the show notes. Other examples of Category 1 agents include a travel planning agent, a voice enabled book companion agent, a content automation agent that converts YouTube content to LinkedIn posts, a knowledge base of your organization with Internet search like a Perplexity clone an agent that generates deeply researched blogs based on a given topic and a highly personalized calorie counter app that allows you to upload images of your meals to keep track of your daily caloric intake and recommends better dietary choices and exercises. Here's how to evaluate Category 1 products the following metrics are designed to answer a simple did this agent automate the right process or should this idea be reconsidered or re scoped? A deterministic agent built for the email automation process can be evaluated against these metrics. Workflow Completion rate, which is the percentage of executions that finish successfully. Automation rate, which is the percentage of requests handled without human intervention. Accuracy, which covers correctness of intent, classification, data extraction and routing decisions. Latency measuring time from trigger to final output with P50 and P95 metrics, if relevant Cost per workflow covering total large language model and API cost per completed run. Error rate, measuring the percentage of runs failing due to tool integration or system errors. Human review rate tracking the percentage of runs requiring manual approval or intervention. Here are workflow completion rate metrics from a real life example of a category one product, an email support agent built by a SaaS company we worked with. Week one had a 52% completion rate with lots of edge cases discovered. Week four reached 78% completion rate with refined classification logic. Week eight achieved 87% completion rate, stable and production ready. The result was 3,000 support emails per month, automated, two and a half full time equivalent hours per day freed, and $18,000 per month in savings. When these metrics stabilize in cost trends downward, the workflow is doing what it should. If completion remains low or manual intervention stays high, the problem may not be deterministic enough for this category. How do you know if you've outgrown category one? Here are six signs. The first is when your flowchart has 30 plus nodes and you're adding new branches every week. The second is when customers phrase things in ways you can't anticipate, and mapping all variations is impossible. The third is when the agent needs to decide which API or knowledge source to use based on context, not follow a predetermined path. The fourth sign is when breaking down ambiguous requests requires exploration and adaptation, not predefined decomposition. The fifth is when the highest value opportunities can no longer be expressed as predictable workflows. And the sixth sign is when most quick win processes are already automated. If several of these signals are present at once, the problem is no longer a good fit for a deterministic workflow and you should consider Category two. Here's a deep dive into Category two With this type of agent, instead of defining the flow, you define the available tools, and an LLM autonomously decides what to do next. The agent operates in a loop observe, reason, act, observe, result, and repeat. The key characteristic is that you control the tools while the LLM controls the reasoning. Tools most commonly used for building REACT agents include langgraph, Crewai, Autogen, and other agent orchestration libraries that support tool use, memory, and dynamic planning. Here's how to prioritize category 2 products Category 2 is for situations where user requests are ambiguous, workflows cannot be mapped in advance, and real value comes from flexible contextual decision making. If you need agents that can reason across multiple tools, handle conversational interactions, or adapt dynamics dynamically to new inputs, that's a Category two product. Category two products are more complex to plan and carry higher execution risk than Category one. Most initiatives in this category share a similar profile. They typically require a team of a PM software engineer and AIML engineers, plus infrastructure like an agent framework such as Langgraph, Crewai or Autogen, connections to company tools and data, and LLM access with a timeline of 8 to 16 weeks and moderate to high cost income complexity. The combination of longer timelines, specialized expertise, and higher costs is what makes Category 2 initiatives powerful but more demanding than Category 1. If your backlog includes problems that truly require reasoning and dynamic behavior, prioritizing Category two projects becomes essential. They unlock use cases that deterministic automation cannot handle and enable more advanced high impact agent experiences. A product belongs in category 2 if the same user request can trigger different action sequences every time. That means that you don't determine the path the LLM does. That's the key difference from Category one. Here are some more traits of a Category two product. The same high level task requires different sequences of actions depending on Input. You have 5 to 15 plus distinct capabilities and the right one depends on context. User intent is ambiguous and needs clarification through interaction. Multiple input modalities including voice, image and text need to be understood contextually. Breaking down complex requests into subtasks is part of the value in our work with customers. This is the right choice for 25% to 30% of agent opportunities. For an example of this type of product, let's use a voice enabled shopping assistant Opportunity Customers should be able to search products by voice, upload images to find similar items, check order status, update preferences, and initiate returns all through conversation. At first this sounds like Category one. Just map out the intents and route accordingly. Right? But but in practice, real conversations don't Follow fixed paths to see why. Let's walk through one interaction A customer uploads a photo of shoes and says, these are too small, I need a size up and I want them delivered by Thursday. Here's what happens under the hood first, observe the agent receives mixed input an image plus voice request. Then reason it determines. First identify the product in the image. Then find available size variants. Then check delivery dates. Finally, confirm the order with the user. Next, act. The agent dynamically selects tools visual search to identify the product, check inventory to find size up availability, get delivery options to verify Thursday delivery and place order after confirmation. Finally, observe result and reason again. Each tool response updates the agent's state and influences the next step. This sequence cannot be predefined. If the item is out of stock, the agent may suggest alternatives. If Thursday delivery isn't available, it may propose pickup. If the image can't be recognized, it asks a clarifying question. The same user request triggers different action sequences based on reason considerations. Other examples of Category 2 agents a conversational customer support agent a code assistant that modifies repositories like Claude code an intelligent personal shopping assistant an IT troubleshooting agent a sales copilot that researches accounts and drafts outreach and a multimodal assistant combining voice, image, and text. Here's how to evaluate category 2 products Reasoning agents should be evaluated on whether they help users achieve their goals across variable paths while remaining efficient enough to justify their cost. The following metrics answer the question Was dynamic reasoning necessary or should the problem be simplified to a lower category? Key metrics include task completion rate, which is the percentage of sessions where users achieve their intended goal Reasoning accuracy so you can measure the correctness of task decomposition, task tool selection and decision ordering Conversation length measured as average turns to resolution and multimodal accuracy, which tracks correctness of image, voice, or structured input interpretation when applicable. Tool call efficiency to measure the average number of tool calls per successful session latency looks at time per turn and end to end session duration Cost per session to track the total LM and API cost per completed interaction User satisfaction measured through post interaction Customer satisfaction scores or equivalent signals Business impact To measure the lift in conversion, retention or task success versus baseline. Here are some metrics from a real life example. A voice plus image shopping assistant for a home goods retailer we built Month one showed 71% task completion, longer conversations, higher tool usage, and $0.12 cost per session. Month four showed 86% task completion, shorter conversations, fewer tool calls, and $0.08 cost per session. The result was that image identification accuracy improved from 76% to 91%, conversion lift increased from 8% to 22%, and customer satisfaction rose from 4.0 to 4.5. When task completion improves while conversation length, tool usage, and cost per session decline, the agent's reasoning loop is adding value. If performance stalls while costs remain high, the problem may be overscoped or better served by the deterministic approach of Category 1 tools. Here are five signs that show you've outgrown Category 2. The first is that your single agent is trying to handle too many domains like customer service plus Inventory plus Logistics plus finance and performance is degrading. The second is that you need agents to delegate tasks to each other, not just call stateless APIs. For example, a shopping agent needs to ask an inventory agent, can you check all warehouses and suggest alternatives? The third sign is that tasks take hours or days to complete, like the automated eval agent analyzing 10,000 conversations overnight. The fourth is that additional indicators include needing hundreds of agent instances running in parallel, coordinating work. And the fifth sign is that different teams want to own their specialized agents, but they need to work together. If you're hitting two to three or more of these signs, it's time to consider Category three tools and approaches. Let's dive into Category three agents. Instead of one agent calling tools, you have multiple specialized agents that coordinate with each other. Each agent is owned by a different team, handles its own domain, and can request help from other agents. This is the end of your free preview. To hear the full episode, become a paid subscriber@lenny'snewsletter.com subscribe if you're already a premium member, you can add the private feed to your podcast app by going to add.lenny's reads.com thanks for listening and see you on the next show.
