Summary7 min read

Practical AI Podcast — Episode Summary

Episode: We’ve all done RAG, now what?
Date: September 29, 2025
Host: Daniel Whitenack (B), CEO at Prediction Guard
Guest: Rajeev Shah (C), Chief Evangelist at Contextual AI
Theme: Moving beyond first-wave retrieval-augmented generation (RAG) deployments — real-world lessons, the evolution of AI workflows, and where value is created in applied AI today.

Main Theme & Purpose

This episode explores the shift from the initial excitement and quick demos with Retrieval Augmented Generation (RAG) systems—used widely for knowledge search, customer support, and internal documentation—to the deeper challenges and “next steps” organizations encounter as they bring these systems toward scalable, maintainable real-world impact. The discussion delves into the importance of context engineering, critical misconceptions around model training versus retrieval, evaluation challenges, the evolving roles in AI/data, and navigating hype to achieve business value.

Key Discussion Points & Insights

1. The Arc of RAG: From Pilots to Production

Initial Hype and Use Cases:
- RAG became the go-to for making language models useful with organization-specific or sensitive data (e.g., HR docs, healthcare guidelines).
- [04:46] Rajeev: “RAG today is kind of one of the most important, or it’s one of the most widely used use cases… every company is probably running some type of RAG at this point for searching its internal knowledge, right?”
Misconceptions About Model Training:
- Many users confuse the need for LLM fine-tuning with what retrieval accomplishes.
- [06:11] Daniel: “It almost seems like OpenAI has a separate model for every person on the planet, which is not feasible... there’s kind of this jargon of training thrown around a lot, which is confusing.”
- Rajeev: Emphasizes that context engineering—providing the right information at the right time—is often much more practical than retraining.
Scaling Pitfalls:
- Moving from toy demos to real-world deployment exposes challenges in scaling, latency, accuracy, user interaction diversity, and mounting pipeline complexity.
- [24:05] Rajeev: “The trouble that people get into... is scaling it up. It’s great on a hundred documents, but now all of a sudden I have to go to 100,000 or a million... Or the accuracy is not what I was looking for... There’s all these kind of trade-offs as you get to production.”

2. Beyond Retrieval: Reasoning, Agents, and Context

Reasoning in LLMs:
- The field has shifted from models simply recalling facts to exhibiting “reasoning” (structured, stepwise problem-solving, e.g., multi-step math).
- [13:32] Rajeev: “They’re doing lots of extra steps and they’re doing these steps in a logical way to better solve a problem... We’ve literally given examples of, hey, this is how I solved this word problem… I want you to learn how to go through these problems step by step.”
Context Engineering:
- Effective systems now require orchestration—retrieving, memory management, re-ranking, query reformulation, summarization.
- [07:07] Rajeev: “What we see inside of AI engineering [is] context engineering... managing interactions with these models. Whether it’s RAG, memory... summarizing conversations...”
Agents and Autonomy:
- As tooling matures, line blurs between rigid workflows and more autonomous “agent” systems.
- [28:33] Rajeev: “We all like the idea of this agent, right? Like something I can give a problem to and it solves a problem... The big trade-offs that developers have today is how much structure, how much babysitting am I doing for this agent?”

3. Achieving Business Value: Hype vs. Reality

The Science Experiment Trap:
- Many projects become “science experiments” that either don’t deliver measurable value or fail to integrate with real workflows.
- [19:23] Rajeev: “I see often what I call science experiments where teams like the latest technology, they go out and run this stuff, but there’s no way for them to actually get that implemented inside the company in a useful way.”
Organizational Adoption is Harder than the Technology:
- Embedding even simple AI into workflows brings the hardest challenges (training, change management, integration).
- [20:39] Daniel: “Part of the hard problem is cracking what actually does provide value to your organization, what can be adopted, how you communicate that, how you tell that story…”
The 95% Failure Rate (Myth & Reality):
- While the “95% of AI pilots fail” statistic is often cited, it’s not unique to AI (applies to experimentation in general) and isn’t necessarily a bad thing.
- [19:23] Rajeev: “You want things to fail… because if something works, you have to maintain it... there’s a cost for something that actually succeeds.”
Start Simple, Stay Close to Users:
- Instead of chasing complexity, prioritize simple AI that addresses genuine use cases and deeply involves end users from the start.
- [22:19] Rajeev: “...once you cut through that, sometimes you figure out that really they don’t necessarily need a fancy GPT-5 model to solve their problems... spending time talking to those end users is going to give you the biggest bang for your buck..."

4. The Evolution of Roles: Data Science, AI Engineering, and Citizen Developers

Changing Landscape:
- More business domain experts now build sophisticated solutions themselves.
- [32:06] Daniel: “I see that middle zone shrinking... domain experts on the business side are actually able to use very sophisticated tools now to kind of self-serve…”
Data Science Still Needed, but Shifting:
- The essence of data science—connecting technical and business perspectives, analytics, evaluating impact—remains crucial.
- [32:55] Rajeev: “At the end of the day, a journalist is a storyteller telling you... For me, the data science is a similar piece... you still need a flexible mind as a data scientist to talk to stakeholders, figure out the coding, the algorithms…”
Evaluation Remains a Key Gap:
- Software engineers moving into AI often need to learn how to properly evaluate and monitor models, a strength of traditional data science training.
- [34:41] Rajeev: “One of the biggest problems they have is with evaluations. And for data scientists, they’re trained on how to do evaluations…”

5. Looking Forward: Blending Old and New AI Tools

Generative AI Isn’t Always the Best Solution:
- Many enterprise problems can be solved without LLMs; traditional techniques often work better or more efficiently.
- [36:14] Rajeev: “There’s a lot of problems inside an enterprise that can be solved without large language models... my worry is the people coming into kind of AI and data science nowadays aren’t seeing those types of problems…”
Continuous Learning and Experimentation Recommended:
- Stay up to date but don’t get distracted by hype—combine daily, incremental learning with focus on durable value.
- [41:40] Rajeev: “Continual learning is the future... I have my own content that I put out at Registics... Newsletters are a nice way to be able to take in all the information that's coming in, but in a little bit of a slower kind of meditative way.”

Notable Quotes & Memorable Moments

On the Experience of Scaling RAG ([24:05]):
Rajeev: "The trouble that people get into... is scaling it up. It's great on a hundred documents, but now all of a sudden, I have to go to 100,000 or a million documents. How am I going to do that?"
On Separating Value from Hype ([17:19]):
Rajeev: “When you’re in an organization, you really have to think about the problems that you have... it can be very easy to be kind of seduced by the technologies, by what a shiny demo is…”
On Organizational AI Failure ([19:23]): Rajeev: "You can't expect every initiative, every experiment, everything that you start to succeed, you want things to fail... there's a cost for something that actually succeeds."
On the Role of Data Scientists ([32:55]): Rajeev: “You still need a flexible mind as a data scientist... where you need a lot of this kind of left brain, right brain stuff. And so it’s still a fairly unique role.”
On the Balance Between GenAI and Classic Solutions ([40:21]): Rajeev: "There's a great wake of tools that are out there that I still like to kind of point people to — it might not get the most attention, but... a lot of times [it's] a more efficient way of solving your problem as well."

Timestamps for Key Segments

00:48–04:46 — Rajeev returns: State of Midwest AI, recap of RAG’s rise, how retrieval fits into practical AI
06:11–08:31 — Misconceptions about LLM training, context engineering explained
09:59–13:32 — Real-world knowledge, dealing with conflicting “facts,” what counts as “reasoning” for LLMs
17:19–22:56 — RAG and the science experiment trap, organizational integration, why 95% of pilots fail
24:05–27:25 — Pitfalls of scaling RAG, complexity of pipeline maintenance, the future role of reasoning models in troubleshooting
28:33–30:17 — What’s an “agent”? Autonomy vs. plumbing; are we repeating the AI-vs-ML-vs-DS debate?
32:06–36:14 — Changing roles: Shrinking middle layer, evaluating models, balancing software and data science mindsets
39:04–40:21 — Will LLMs recommend older tools? The persistent need for human orchestration
41:40–42:18 — Rajeev’s advice: Newsletters, continual learning, and mixing “old” tricks with new tech

Further Resources

Rajeev Shah: TikTok, LinkedIn, and his “Registics” content stream for practical AI insights
Practical AI Podcast: practicalai.fm
Midwest AI Summit: midwestaIsummit.com
Relevant newsletters: As recommended by Rajeev for “meditative” AI learning

TL;DR

While everyone can now build a RAG chatbot, the real challenge (and opportunity) for enterprises is in scaling, integrating, and maintaining AI solutions that truly align with organizational workflows. The conversation debunks myths about model “training,” covers the new roles of context and reasoning, and urges listeners to focus on solving concrete problems—often with simple tools—rather than chasing shiny, unproven tech. Data science and context engineering are evolving, not disappearing, and success is still driven by user-centric design, continuous learning, and pragmatic adoption of both old and new AI techniques.

Loading summary

Transcript49 lines

[00:04]
A
Welcome to the Practical AI Podcast where we break down the real world applications of artificial intelligence and how it's shaping the way we live, work and create. Our goal is to help make AI technology practical, productive and accessible to everyone. Whether you're a developer, business leader, or just curious about the tech behind the buzz, you're in the right place. Be sure to connect with us on LinkedIn X or Bluesky to stay up to date with episode drops behind the scenes and AI insights. You can learn more at PracticalAI FM. Now onto the show.
[00:49]
B
Welcome to another episode of the Practical AI Podcast. I'm Daniel Whitenack. I am CEO at Prediction Guard and not joined by Chris today, but very happy to be joined by a long time friend of the of the podcast and friend of all things AI and data science online, Rajeev Shah from Contextual AI, where he's the chief evangelist and also of course Tiktoker extraordinaire and all things. So welcome back to the show, Rajeev. It's great to have you.
[01:24]
C
It's great to be back. Excited to talk.
[01:26]
B
Yeah. And we're close neighbors around the Midwest as well and looking forward to seeing you at the Midwest AI Summit, which if listeners don't know, both Chris and I will be at the Midwest AI Summit, which is happening November 13th in Indianapolis. So I think, as you put it on LinkedIn, Rajeev, if you live anywhere near corn and you like AI, then this is the place to be. And actually Rajeev will be there giving a great talk about kind of this idea of I think you're going to be talking about this stat, the 95% of pilots that fail, why that might be. Yeah. So great to have a fellow Midwest AI Silicon Prairie friend on the show as well.
[02:13]
C
No, it's great to have these in person events here in the Midwest. Usually you have to kind of fly out to some large city to get that. It's nice to be able to kind of get that right in Indy.
[02:21]
B
Yeah, yeah. So if you're interested in that, go to midwestai summit.com and make sure and register for that. Hopefully we'll see you there in person. It's going to be a fun time, but it has been a while since we've had you on the show, Reggie, and I'm sure there's a lot to catch up on. It almost seems like last time we talked there was kind of we were all getting into rag and thinking about, you know, agents and reasoning models. We're trying things. People were kind of gradually developing maybe what they thought would be a direction for kind of best practices or ways to approach problems. It would be interesting to hear kind of from then to now how you've seen that world advance and kind of these archetypal use cases that people are addressing with retrieval and related frameworks.
[03:18]
C
Yeah, no, it's been amazing. About two and a half years ago, we were really giddy about the possibility of AI, but there was a lot of different things that we can do. And kind of since then, we've seen AI develop in certain ways, where, for example, in code completion, code development tools, it's accelerated dramatically even over the last year like that. And we're all kind of using those tools on a regular basis. And then in other areas. Right, like chatbots, for example, we've seen some progress, but we've also seen that there's some things that are difficult to do with chatbots as well. But I think, you know, it's been incredible to see just the continued development of AI in terms of the capabilities, the smartness, the ability to kind of use tools to find search information, that there's still so much for all of us to do in our lives, kind of for those of us that work with AI.
[04:08]
B
Yeah, yeah. And I guess, like, certainly there's a lot of people still, I guess when I interact with customers or hear from other people in the industry, still a lot of people are wanting, as their first things to kind of, whatever, is, build a RAG chat bot, which maybe for those listeners out there that are maybe not as familiar with this, they've maybe heard, you know, rag this and raggedy rag that and all of that, maybe just remind. Remind listeners like what rag is and like how retrieval, you know, may fit into some of this AI stuff.
[04:46]
C
Yeah. And I'll start simply, even a couple of years ago, I was kind of working with the earliest large language models, and people would be like, I like ChatGPT, but it doesn't know anything about me or my company. Like, how can I train it to know all that knowledge? And what we figured out is that it's not about really training that model about the knowledge, it's instead finding that knowledge, searching that knowledge, passing that on to the model, and then be able to use that. And that's where kind of this idea of retrieval, finding stuff and then augmenting the generation, augmenting the written response for that came about. And I think rag today is kind of one of the most important, or it's one of the most widely used use cases, I'd say, in the Generative AI space. Like every company is probably running some type of RAG at this point for searching its internal knowledge, right? Helping folks figure out through HR documentation, using it for customer support. There's a lot of use cases where we have huge amounts of information. We want to be able to find things from that, but we also want to use the smarts of AI for that. Like, we don't want a Google search results of 20 things. Like we want a nice, like AI summary of it, or we want to extract out all the information. I just want the names and dates out of all this. I don't need to read every one. And so that's where the AI really comes into it and why RAG is so powerful and we see it so widely used.
[06:11]
B
Yeah, and I love what you were saying, just sort of contrasting this with training or even fine tuning. I find this, of course, to be a very widespread misconception about how these tools work. Even even ChatGPT, the applic, like, it does seem like it kind of quote, trains on your information that you put in your previous chat history, et cetera. And then like, it's almost like you have a model of your own, right? Which seems like. It almost seems like OpenAI has a separate model for every person on the planet, which is not feasible from the data science training perspective. So could you highlight some of those things and just highlight just. I know you educate a lot as well, so I'm sure you've seen this as well. There's kind of this jargon of training thrown around a lot, which is confusing maybe in how it's used. So.
[07:08]
C
Yeah, yeah, And I think partly is a lot of these companies want to make you think that they have like one thing that does everything in the world. But if you actually take a look at something like ChatGPT, it's a system, it actually composes of multiple parts. So if you were trying to build your own chatgpt, these are some of the things that you would have to think about. So one is I need to be able to retrieve from lots of different sources. I have all my knowledge inside my companies, inside of Confluence. I need to be able to access and retrieve that. That could be one thing. So one thing is using rag being able to access tools. Another thing is memory. Like, I wanted to remember that last conversation I had, whether it was two things I said ago or like, one of the nice things about ChatGPT is you can come back a conversation or a day later and it remembers a lot about you. It could maybe a little bit scary Sometimes how much it remembers about you and how well it can profile you. And so this is kind of what we see inside of AI engineering as context engineering, where there's a number of different parts of managing interactions with these models. Whether it's rag, whether it's remembering the memory, when it's knowing how to summarize conversations to do things like multi turn so we can have those repeated conversations but keep track of what was said earlier in our conversations as well.
[08:32]
B
Yeah, so I love that idea of context engineering. How is that idea of context engineering kind of different from like what we would maybe from the data science, traditional data science side consider like model training, I guess. So there's almost like a pipelining mindset versus an actual like GPU model training piece.
[08:56]
C
Yeah. You know, I think what's, what's remarkable nowadays is the amount of information that's stored inside of these large language models. I do a thing when I kind of talk about large language models where imagine if you were sitting outside and kind of reading a book and you could read all day, all night, if you read for 10 years. I think that's on the order of something like about a billion tokens. And these models are trained on the order of like 15 trillion tokens. So it's just an inconceivable amount of information that these models hold inside there. And I think one of the biggest, biggest improvements we've seen in these models is not only kind of stuffing all that information in there, is effectively using all of that information in there. That these models really today are untapped in terms of everything they can do. It's one of these things on the technical side is why you see that the capabilities of the models have continued to grow without them necessarily having to grow immensely in size, simply because we're better tapping into all the abilities that they have inside them.
[10:00]
B
Yeah, I know you're up for this because we know each other, but I'll present a scenario to you that I as sometimes comes up for us and you know, whether I'm teaching a workshop or something and you can help us understand how this retrieval type of thing fits in. So I often, maybe I'm talking to folks in healthcare and what's interesting is there's all of this huge amount of data that these models have been trained on and there is good, let's say medical information in that data. Right. Or there's like care guidelines for nurses or medics or whatever it is. Let's take that as an example. The interesting thing is there's Even like if you think about factuality in a situation, it is kind of relative in the sense that if I'm a medic that works for this provider, let's say their care guidelines are different from another provider. Or if like I'm treating, you know, maybe children, those guidelines are probably different than if I'm treating, treating adults or senior adults or something like that. So there can be these facts out there that are actually conflict with one another. And it's really. You talked about the context, it's really about the context that you're working in, what it should pull. So these models, assuming that all that information is public, right, like you said, and it's been scraped off the Internet, all of that's kind of, you could imagine that maybe somehow that is embedded somewhere in the model, the base model. So whether we're talking about, you know, llama or GPT or whatever, how can, let's say I'm a healthcare company, but I'm interested now in my context now, how do I make that connection? So still use. Maybe I'm not going to retrain that model, but how do I use the model then in this context of kind of conflicting facts?
[11:57]
C
Yeah, absolutely. As much as the models know and been trained so widely, they don't know everything. And so sometimes that's, you have your own information, you want to pass that into the model. And so that's where that retrieval, augmented generation comes in, where you want to grab that information and then we want to pass it into the model. And this is where for folks in the space for a while, prompt engineering comes into play, where we think about the inputs that go into the model, where we take the facts that we know about our own given situation, along with maybe some instructions for what we want along with it. And all of that is manipulating the context. Right. The larger world that this model is sitting within there. And by manipulating and giving it information, instructions and other facts, we can get outputs that more. That better match kind of what we're looking for. Like that.
[12:46]
B
Yeah, yeah, that's great. And I guess there's another term that's been thrown around the last couple years. One would be this kind of retrieval piece which you've talked about, and this idea that in the context of a certain question or something, I'm able to connect to one of my internal data sources, get that right information. There's this other thing that would be related to reasoning. Now some people might kind of just consider that all kind of generative AI models reason in one way or another, but it Is kind of now almost like a term of art. It means kind of a certain thing when we're talking about these models. Could you help highlight that also to put that in context? And then maybe we'll circle around and combine some of these things.
[13:33]
C
Yeah. A lot of times when we talk about models, we want to make it convenient to help people explain things. And so we kind of anthropomorphize them. Right. We give them human like qualities when they're not actually human like qualities. And so, for example, reasoning is something we say for these models. Now they're not reasoning like a human would through a problem. But what we typically mean kind of when we think about reasoning with these models is, is they're doing lots of extra steps and they're doing these steps in a logical way to better solve a problem. So if you take kind of a math word problem, a great way to see this is if you take a word problem like the train is moving east, you know, at 50 miles an hour, and another train is moving west, another 40 miles an hour, and you have to figure out like at what point they're going to cross. Right. There's no kind of quick answer to that. You need to kind of calculate the first train, calculate the second train, and then you can figure out the solution. And what we've done, and that's what these reasoning models have done, is we've trained them that don't come up with the answer right away. Think through it, Think about the first thing, think about the second thing. Connect all the dots and then put an answer in. And the way we've done this is we've literally given the model well, we've trained it in multiple ways, but in some ways we've literally give the model an example of, hey, this is how I solved this word problem. I did this. All these steps here, I want you to learn how to go through these problems step by step. And this is where we differ from two and a half years ago, when the first models didn't know how to do that. Those first models, all we had trained, it was like, hey, tell me if a movie is a good movie or a bad movie, or tell me if this sentiment is happy or sad. But since then, we've had time to develop more training data. We've given them more complex training. And the cool thing is the models have picked up, they've been able to learn this capability for doing this. And so now we're able to do this much more complex kind of, I'm doing the air quotes of reasoning to solve these problems.
[15:28]
B
Yeah, I guess one thing that people, and this is part of why I really enjoy watching your videos online, is you often break down a lot of this jargon and kind of help. It's almost like people feel the shock of all of this terminology and a new model coming out every, you know, 30 minutes or whatever it is, and just not really knowing how to deal with that. And there's kind of all of these things that have happened. So there's reasoning models, there's small language models, there's tool calling, there's retrieval, there's like all of these different kind of mix of things based on your experience and, you know, both what you've implemented personally, also what you've seen and interacted with folks on. If people could kind of bring in some focus and maybe imagine a company's kind of getting into, you know, AI, however they view AI transformation within their organization and they're thinking about some of the use cases that are kind of the initial ones on their roadmap at the time being, like, what would you encourage them to kind of like pick the signal out of the noise. So what would you encourage them to maybe focus on? To see. See some of that time to value up front. Not that they don't want to explore some of the other topics that are happening or, you know, read about them or whatever that is, but how would you at least recommend kind of to get that best time to value or maybe just the things that are producing the most value? And you could maybe flip that as well and say, like, what are some of those things that are cool things, but maybe let's wait and see what happens and maybe maybe just sort of don't get distracted at the moment.
[17:19]
C
Yeah, I mean, you're taking me away from all the full, all the fun, cool technologies that are the latest, like automation things that I can kind of fire up like that. I think, you know, when you're thinking about this from the perspective of kind of a company, if you're kind of a manager in these situations, you have to figure out what use cases you want to put on. A very different hat than just thinking about the technology itself. And I can tell you this, like, I was burned from this, from personal experience, when I was just starting out in data science, I was entranced by the latest technology. So I remember, right, we were talking about code development, like 10 years ago. I was working at State Farm and I think Andrew Karpathy has written his paper about the, I think the unreasonable effectiveness of LSTMs it was something like that. But part of that paper had the idea of you could translate code, you could complete code from one language to another. And I was like, hey, come on, guys, we've got a lot of this COBOL code sitting around. Give me some. Give me a few GPUs and some data and I can. I'll solve this. Right. Like, it's totally naive. Like, you know, they didn't fund that project or anything. But I think, yes, it can be very easy to be kind of seduced by the technologies, kind of what a shiny demo is versus when you're in an organization, you really have to think about kind of the problems that you have. Part of it will be, how complicated is this from a technical point of view to get it up and running? That's one factor. But that's not the only factor. We also need to consider what's the value to this organization. I talk to lots of enterprises on a regular basis, and I see often what I call science experiments where teams like the latest technology, they go out and kind of run this stuff, but there's no way for them to actually get that implemented inside the company in a useful way. And they're literally just kind of interesting experiments that people are running like that.
[19:15]
B
Does that get partially to like the 95% of AI pilot failing type of report from MIT?
[19:24]
C
Absolutely. Now, the 95%, of course, is like a little bit of a hype number that they like to put out in this. And. Right. For those of us been in the space for a while, we remember the fact of 80% of data science projects fail, I think was something that we had. And to some extent that's okay. You can't expect every initiative, every experiment, everything that you start to succeed, you want things to fail. Because partly is if something works, that means you have to maintain it, you have to monitor it, you have to put up a lot of guards around it. There's a cost for something that actually succeeds. But when we talk about AI, it's very easy to build a cool demo, but it's not only the value to the company. You have to figure out how to integrate this into people's everyday work life. And so you can build a very shiny widget that sits and can. Can do something awesome. But if that's not inside somebody's regular workflow of how they work, the tools that they work in, if it's not, if they're not properly trained on how to use it, if their leadership isn't supporting you to use it, there's lots of factors like that that go into why people might not actually adopt and use a technology. And it's really nothing about AI. It's really about organizational change and introducing technology into companies.
[20:40]
B
Yeah, I know from just a founder perspective, it can be just from a different side of this, it can be frustrating when you see like, oh, there's this, you know, company over here that the technology side is fairly simple. Like, oh, it's just a simple model that does a simple thing or a browser extension that does this and you're like, wow, you know, like, I could have vibe coded that in a, in a weekend. Like, how are they, you know, scaling to the moon? And we have all this cool technology and I think part of it is that that side of it that you talked about, like part of the hard problem is cracking what actually does provide value to your organization, what can be adopted, how you communicate that, how you tell that story and how you deliver on your promises, how you provide customer support. A lot of that is not really related to the technology and that component. So maybe, I don't know if this would be accurate to say, but maybe the first step people could take is just getting something off the ground that's fairly simple and interacts with an AI model. Maybe it's just to do a very simple task, but really pushing that through, like you say, to be embedded in a product or be embedded in a process, that may be the best way for people to kind of start that journey is to really start from that simple side and deal with some of the, in all honesty, the harder problems around the periphery of the technology.
[22:20]
C
And you know, I think a big part of that, like what you're saying is just to get closer to those end users, the stakeholders. Because I think once you often cut through that, sometimes you figure out that really they don't necessarily need a Fancy kind of GPT5 model to solve their problems. Maybe you can solve it with almost a simple if then rule that you can just implement and in some place. And so this is where kind of looking at the data, spending time talking to those end users like that often gives you a much better result. It's going to give you the biggest bang for your buck than going out and reading some archive paper.
[22:57]
B
Yeah, yeah, that's true. And I guess now you should just have the AI model read the paper for you and give you some summary points.
[23:07]
C
I'm a big fan of that. On my walks, I often sit and I'll talk to ChatGPT and we'll talk through papers and what are the main technical points and stuff?
[23:15]
B
So yeah, I'm glad I'm not the only one. So thanks for validating that. I guess getting back more into the development retrieval kind of reasoning stuff. What are some of that. Now that we've been working with this technology for a while, we've got more cycles. Someone can spin up a RAG pipeline in whatever, 10 minutes. I can spin it up and I have something going. But it's another thing to kind of of course, scale that, maintain it over time, deal with some of the issues. You know what, I guess my question is what pitfalls are people falling into that maybe we didn't know about? Whatever it was a year ago when we were kind of just getting into these initial kind of naive rag sorts of things. What, what challenges or consistent challenges and pitfalls do you see people kind of falling into?
[24:05]
C
Yeah, I think the consistent thing I see with something like rag is it's fairly easy to build a quick demo. You can grab an off the shelf embedding model to do that. You can combine that with a generation model like an OpenAI model and you can build yourself a quick kind of proof of concepts. I think the trouble that people get into with it is scaling it up. It's great on a hundred documents, but now all of a sudden I have to go to 100,000 or a million documents. How am I going to do that? Or you know, when I first did my demo, I did a couple of very simple queries, text extraction queries. But now when I put it out in front of my users, I find out all of a sudden they're not giving nice one sentence queries, they're just asking two words. And then I need to add, right, a query reformulation step or something to do that. Or the accuracy is not kind of what I was looking for. And so I've added a bunch of pieces in there, I've added a re ranker, I've added other steps, but now my latency is suffering right there. There's all these kind of trade offs as you kind of get to production and then you're like, oh, you know, do I go back? And you go and you look online and you see that, oh, wait a minute, there's like 15. No, I think that, I think there's like 25, 30 flavors of rag and you're like, oh, did I set up my infrastructure wrong? Or oh, do I go back and I change my chunking strategies? I can see, you know, there's 10 different chunking strategies. People are doing that and so I think this is the cycle of where it's very easy to get started, but getting to that final kind of production quality rag can kind of be a little wearisome.
[25:41]
B
Yeah. And do you think that that's where the real human engineering piece of this development still will be with us for some time? Because to some degree you can describe, let's say I describe that sort of problem to my AI coding assistant. Is it reasonable for me to think that like that kind of debugging and process could help me or that kind of assistant type of thing could help me get to the bottom of this or update my retrieval pipeline and that sort of thing?
[26:13]
C
So I'm kind of optimistic that the reasoning models that we have now are going to get us much farther towards helping you solve that problem. Now, of course that reasoning model's got to understand like how you're thinking about how to solve that problem. But already today, if you take the traditional RAG approach, for example, but you pair it with one of the reasoning models that can make tool calls, that can kind of look at the results that come back, think about it, decide, hey, I want to re query it in a different way, you can improve the quality of your results in that way by kind of using that reasoning to do that. So I'm pretty optimistic that we're going to keep finding new ways as long as there are workflows that we can train these models on that are kind of fairly, let's say, logical.
[27:01]
B
A way.
[27:01]
C
We can kind of connect the dots and teach the models to do that. I think a lot of these things that if we give it the budget for spending time thinking, doing those tokens there, it's going to cost us more latency. But we're going to see better results. And I think some of us, we already see that in using some of these tools like Deep Research, where we can see that by spending more time on the task, it's able to give us a better result?
[27:25]
B
Yeah. And one area that we haven't talked about yet is this sort of world of agents, which I know is a loaded term. And it's probably related to what you were just talking about in terms of time of compute and steps in the process and the reasoning models and all of those sorts of things. Could you help us parse apart from your perspective? It almost seems to me like it's one of those, I don't know if you're. Oh, I'm sure you remember, you know, when it seemed like every conversation I got on like the first Part of the presentation was what is the difference between data science and AI and machine learning or the difference between machine learning and AI or whatever. And at a certain point I was like, well, these terms all mean nothing essentially because people use them so interchangeably. I feel like that's sort of where we're getting with agents and assistants and all of these sorts of terms. But from your perspective, I guess if that word agent kind of has a meaningful difference to you, what, what kind of stands out in your mind there?
[28:34]
C
Yeah, and I think like we all like the idea of this agent, right? Like something I can give a problem to and it solves a problem. And now I think there's where the definitions break down to is how much autonomy is this agent, how structured is what we do like that. But if we just think back about the bigger picture of like, I have a, I have a problem, it's not a straightforward problem that I want to give to an agent. Now there's at least two different ways that we can kind of tackle this. One is I can give them a step by step list of instructions and this is what we call a workflow often like, do this, do this, do this. And then I can check their work at every step and make sure that they're on the track to solve it. Or I can just be like, this is the difference maybe between my kind of 5 year old and my 13 year old. My 13 year old, like I'll give them the list and I'll cross my fingers and hope that they'll finish it. And you know, usually they do, but not always. But I'm not involved in every step of the way. And I think one of the things is we're watching the agents evolve and this is one of the big trade offs that developers have today is how much structure, how much babysitting am I doing for this agent, how do I do versus kind of the hands off. Now I think the trend is, is, is we're going to be able to do much more hands off. Just like we've seen these models be able to gain the reasoning ability over the last two years or so. I have no doubt that we're going to be able to train them to do more complex tasks, to be able to follow those steps. It's just a matter of kind of giving them the training data, having the experience to do that. So my bet is in the long term, for many of these tasks, we'll be able to be much more hands off and the models themselves will strive to be able to solve them. Themselves.
[30:17]
B
I just thought it would be good to get your input on a theory I've been having which is maybe related. I mean maybe it's an offshoot from what we're talking about, which is really maybe the ability to use these vibe coding tools or others, or assistants or agents to update ret processes or kind of architect our AI pipelines, if you will. I've had this sort of thought and I'm curious about your opinion because you also have a background kind of pre generative AI in the data science world that like previously I kind of had in my mind this mental model of on the one side you have kind of engineering, traditional software engineering, DevOps infrastructure, on the other side you have business and the product and marketing, all of those things. And in the middle is kind of data scientists because you translate the business problems and understand how to connect it to the data and the tech and produce your predictive model. And you're kind of living between those two worlds. And it's almost like I see that middle zone shrinking and shrinking and shrinking because those domain experts on the business side are actually able to use very sophisticated tools now to kind of self serve themselves a lot of the kind of maybe stuff that would normally fit on the plate of a, of a data scientist. So part of me is wondering like me as a, me personally, like Daniel Whitenack as a data scientist, like what is the future of that data science world when this kind of middle is shrinking. I'm curious if you also see it that way or see it slightly differently and what your thoughts are in terms of that view.
[32:07]
C
Yeah, so I don't think the data science world is changing at all. First of all, I'm excited that the bar is kind of dropping in terms of people being able to use code to build solutions. My 9 year old can literally vicode of a game that he can play as well as my 23 year old who has a degree in computer science. And you couldn't tell the difference between the games that they built like that, like so there's a great ability in just allowing everybody to be able to kind of more participate and work kind of with code that we're being able to see now. Now how does that change something like data science? Right, like data science. The original triangle was right. Part of it was coding. I think data scientists were never thought of as really great coders, which is why. Right. They were put in there.
[32:56]
B
80% of those projects didn't make it past pilot two.
[32:59]
C
Exactly right. Like they would not write production code and Right. There was mle. Engineers kind of became the offshoot to kind of do the production piece like that. Now for me, it's a similar thing to like, if you think of like journalists in media, right? Like everybody says, oh, right, everything's going online, we're not going to have any journalists. Well, if you think at the end of the day, a journalist is a storyteller telling you about kind of the facts of what's going on. For me, the data science is a similar piece where it's still kind of a mission work that we're doing in terms of we're helping a business solve problems by looking at their data. The tools have changed, but the same problem still exists. And I think this is the most important thing is you still need a flexible mind as a data scientist to be able to look at data, to be able to talk to a stakeholder, to be able to go out and figure out what is the coding, what is the math, the algorithms to bring to solve that problem. Where you need a lot of this kind of left brain, right brain stuff. And so it's still a fairly unique role. And you can see this where you start talking to like AI engineers, where you have developers that are trying to kind of bridge this and solve the business problems. And we see one of the biggest problems they have is with evaluations. And for data scientists, they're trained on how to do evaluations coming up. Like, you look at the data, you talk to people like error analysis, something that's built into kind of data scientists. But I always look over and see like how we have to kind of teach software developers that skill if they really want to be able to kind of do the same kind of work like that.
[34:41]
B
Yeah, that's super interesting. I kind of posed the question and maybe a little bit of a controversial way, I think I would echo what you're saying. I mean, there's elements of this in certain cases, like in certain industries where, hey, if you're using computer vision to analyze parts, coming off of medicine, coming off of a pharma manufacturing line and needing to do that 10,000 times minute, like, this is not a problem that is like, hey, just prompt an LLM and this. Like, that's a. Like, on one side there's very hard kind of modeling problems that need to be solved there. I think on the other side to what you're saying, I also see this kind of gap around AI engineering where it's okay, we can architect the pipeline and where I see a lot of people spinning their wheels is Saying, well, it seems like this is working. That's kind of where they end. Like, well, we could also measure if it's working and construct a test set and maybe automate that. And as we update our pipeline, we could test the retrieval and those sorts of things. So love that perspective around the evaluation especially, I guess, do you see that side being stressed more and more as maybe software engineers see that the future is AI related and really want to push into that?
[36:15]
C
So I see that there is kind of a growing emphasis for developers and engineers to understand evaluations and do that. My thing is that it's always going to be a little bit of attention for those folks because often the folks that are really good at software development have a very black and white way of looking at the world, that they focus on optimization, that there is a best solution which necessarily isn't the same type of mindset that you need. And of course this is, you know, a graduated spectrum. Everybody's a little different, fills that. So this is where I think there's always going to be that gap between just having kind of software developers fully step into it. But I want to take up one other thing that you say is a lot of times like the hype we see around generative AI and Nvidia and stuff kind of draws out and makes kind of the problems that we can solve with generative AI kind of much bigger than I think, the actual usefulness of them. So, and what I mean is that there's a lot of problems inside an enterprise that can be solved without large language models. And my worry is, is the folks inside them that have been doing data science for 10 years know that. They know that I could use operations or optimization to solve this problem, or this is a time series problem to do that. My biggest fear is the people coming into kind of AI and data science nowadays aren't seeing those types of problems and understanding that there's a whole set of tools to be able to solve those problems where often kind of everything comes. We're using generative AI as the hammer for solving every type of problem like that.
[37:52]
B
Yeah, maybe this exists out there. And so, you know, I'm already building something so someone can totally steal my idea if, if they want. But I wonder if there exists out there this kind of idea of some sort of assistant that would live at the orchestration level above these kind of traditional data science tools and help you towards that analysis. So just by way of example, I'm assuming you could put sort of Facebook's profit for time series forecasting behind an MCP server and be able to have that discovered by this kind of orchestra level and maybe guide people to that. Now that might not be the interface that you want to have for your time series modeling, like in production, but it could potentially kind of guide folks to some of these kind of traditional data science tools and kind of help teach them maybe what they need to put in place that's not on the large language model side and actually have the large language model tell you that, hey, I'm not the best for this and, and you should use Facebook profit or whatever.
[39:04]
C
Yeah, I'm hoping that we'll be able to get to that. Like, I think there's some elements of these models are great for kind of brainstorming, thinking through things, solutions through too. But if the space is too large of possibilities of different ways to slice the problem, different ways to think about how you could set the predictions or what data to use, then even in lm, you're not going to be able to feed it all the relevant information to be able to actually kind of make that decision. And this is where as humans we have to kind of often be the piece that takes in a lot of that different disparate information and figure out like, okay, this is what the business really focuses on. Let's zoom it down to this piece. And now I'm going to use my LM to help me think about, hey, there's three different tools here and strategies like, tell me the trade offs, let's figure out which I was doing this earlier today, like which package should I spend my time learning how to use to solve this problem?
[39:55]
B
Yeah, yeah, that makes sense. Well, Rajeev, I'm sure we'll have many other great conversations at the Midwest AI Summit and at upcoming or future podcast episodes, but as we kind of get closer to the end here and you look out towards the future, what is it that kind of excites you about the next steps of our journey in this space?
[40:22]
C
Yeah, no, I mean, it's just been a great time of innovation inside of data science, which is why I love it. I mean, everything from kind of going from XG boost to CNNs to kind of where we are now. And so I'm looking forward to like, more innovation, especially in kind of the area of large language models. But I also want to remind people like we were talking about, there's a great wake of tools that are out there that I still like to kind of point people to that there's, it might not get the most attention, but there's a lot of times a more efficient way of solving your problem as well.
[40:52]
B
Yeah. And where would you like from? Obviously you produce a lot of content and that sort of thing, but as just a person that's more intimately familiar with that kind of ecosystem, if folks are like, hey, I heard, for example, I'm at a rally innovation conference here in Indianapolis today, off in a corner and I heard Kevin o', Leary, you know, from Shark Tank this morning, he was saying, you know, every day you should spend 30% of your mental capacity trying something new like keep those juices flowing. So maybe it's our listeners today they're taking away. Hey, I should try one of these non gen AI things and like, where would, where would I even go to start that? Any suggestions?
[41:40]
C
Yeah, no, I love that idea of like spending 30 minutes or an hour a day, like continual learning is the future like that. So I have my own content that I put out at Registics that tries to kind of inspire you to push you in different ways. That kind of AI is doing, give, give people simple kind of nuggets like that. So I would of course kind of point to myself as well. I think the other area that I really like are newsletters. I think newsletters are a nice way to be able to take in all the information that's coming in, but in a little bit of a slower kind of meditative way rather than just kind of reacting to the latest trending post.
[42:19]
B
Yeah, that's awesome. And I'm sure we'll include a few links in our show, notes to things that will be useful for people, but really appreciate you joining us again, Rajeev. Looking forward to seeing you in person and yeah, keep up the great work. It's always good to hear your perspective and looking forward to having you on the show again.
[42:44]
C
Thanks so much. I think this is one of the longest running data science podcasts out there, so it's been great to be part of it. Thanks so much.
[42:50]
B
Thanks.
[42:58]
A
All right, that's our show for this week. If you haven't checked out our website, head to Practical AI FM and be sure to connect with us on LinkedIn X or Blue Sky. You'll see us posting insights related to the latest AI developments and we would love for you to join the conversation. Thanks to our partner, Prediction Guard for providing operational support for the show. Check them out@prictionsguard.com also thanks to Breakmaster Cylinder for the beats and to you for listening. That's all for now, but you'll hear from us again next week.