Loading summary
Daniel Whitenack
Foreign.
Podcast Host/Announcer
The podcast that makes artificial intelligence practical, productive and accessible to all. If you like this show, you will love the Changelog. It's news on Mondays, deep technical interviews on Wednesdays and on Fridays. An awesome talk show for your weekend enjoyment. Find us by searching for the Changelog wherever you get your podcasts. Thanks to our partners at Fly IO. Launch your AI apps in 5 minutes or less. Learn how @FlyIO.
Chris Benson
Welcome to another fully connected episode of the Practical AI Podcast. In these episodes where it's just Chris and I, no guest, we try to try to keep you updated with some of the things happening in the AI world. Talk through some things that might help you level up your machine learning and AI game. So excited to dig in with you today, Chris. I'm joined as always, by my co host, Chris Benson, who is a principal AI research engineer at Lockheed Martin. And I'm Daniel Whitenack, CEO of Prediction Guard. How you doing, Chris?
Daniel Whitenack
I'm doing good. I'm looking forward to our conversation today. It's a snowy day in Georgia and we can, we can talk a little generative AI and talk about. You wouldn't want to use it.
Chris Benson
Yeah.
Daniel Whitenack
Unless it was snowing in Georgia.
Chris Benson
Kind of things in the theme of coldness on today, which is also cold where I'm at. Talk about the cold side of Gen AI or actually, you know, what we had talked about thinking through were the bad use cases for gen AI or where you shouldn't use gen AI 5 or more bad use cases.
Daniel Whitenack
Yeah. And you know, the funny thing about it is this is a topic that we have casually talked about a whole bunch of times and we had not previously said let's make it an episode. But you know, one of the, one of our. I think it may be a little bit of a pet peeve for not only us, but other people I talk to in the AI space is there are so many, you know, we're at this, you know, huge hype within Gen AI and people just want to use it for everything that there could possibly be an AI application for. And you know, there's so many places where it doesn't necessarily produce the best outcome for you. And we talk about this casually all the time. So glad that we're actually doing this in the show today.
Chris Benson
Yeah, I was creating some, some docs for, for a customer of ours and some training materials and I have this section just labeled Here be Dragons. Yeah. So yeah, there might be some hot takes in here. I'm interested to hear what, what your takes are. My first one. So Number one bad use of gen AI or maybe one that you want to avoid, at least for now, is maybe a hot take. But I would say from my perspective, completely autonomous agents of any type are currently, well, who knows how long this will be the case. But currently and for some time generally a source of sadness for people when they try to create them. So what I mean by autonomous agent would be an agent or an automation that, that has no human in the loop, just sort of is running in the background and you kind of hope that it does something for you so it could be on the sales side, right? Oh, I'm going to have an agent do my whole sales process for me and I'm just going to kind of sit back and work on my product and the agent's going to make all of the sales for me. Or maybe it's, you know, some sort of intern admin process that you're automating or you know, even all the way, you know, into manufacturing with automation in plants or you know, more industrial case, whatever you're thinking of. My first one is completely autonomous agents. What's your, what's your thought, Chris?
Daniel Whitenack
Not only do I think that's right, I'm smiling in a big way because I'm going to throw in something from the side just to support that. Apparently there is a new show on Netflix and I just read about it last night in a Netflix AI is tough for me. And, and, and it's. The show is called Cassandra and it's about this. It's like a home assistant robot, you know, with agency in terms of doing lots of tasks. But it goes. Apparently I have not seen the show yet because I just heard about it but apparently it gets very, very dark and I'm just like when you were talking about that just now, you know, in a more of a real world scenario, obviously it made me think of that and so yeah, I agree. A completely autonomous agen day and age with no guard rails around it and you're just saying go at it. Generative AI especially, especially if it's dealing with anything that has any sort of sensitivity or requires a little bit of thoughtfulness to it. Yeah, not, not going there.
Chris Benson
Yeah, well. And I think even beyond the kind of security, privacy related things, a lot of times I just see people trying to do this and it just doesn't really work that well.
Daniel Whitenack
Early days.
Chris Benson
Early days, yeah, it's, it's early days. So like when you have, and for those that you know, maybe have or haven't listened to previous episodes when, when we're talking about an agent. We mean, you know, you give a task to some sort of system, it has the ability then to generate queries maybe into other systems like APIs or databases or data stores or things to accomplish a certain task. And it kind of loops over that task until it reach reaches an objective. Right. And in the autonomous, fully kind of autonomous case, you would have, you know, just using the sales example because it's easy. You know, you want an agent to decide how to find prospects for you on LinkedIn and then you want to gather, you know, a dossier about all of those prospects, and then you want to initiate the contact and then you want to pull off some type of demo or call, and then you want to, you know, close the deal and do the contract arrangement. Right. And just sort of like determine how to do every step of that process. Basically relate replacing a human and their agency with the autonomous agent. Now, I think in that case we could say certain portions of that can be very interestingly addressed with AI functionality. So doing the prospecting, generating the dossiers. Right. Those are, I would consider those good use cases if they're tied to a, you know, maybe a sales professional that's deciding how and when to do those things. In the imagination, it would be great to think of just kind of letting that run in the background and you getting sales all the time, but it just doesn't really work very well. There's a lot of fragility in that type of system. When there's a lot of that determination of objectives and determining how to interact with systems and all of these things, it produces a lot of errors, a lot of fragility. It's much, much more productive, at least currently, for you to have a tool that can help your sales professionals prospect or a tool that can help them create these, you know, dossiers and that sort of thing and certainly tie in AI to that, but not kind of this end to end completely autonomous automation.
Daniel Whitenack
I totally agree with you and I certainly. By the way, just as a clarification from what I said earlier, I was not meaning to imply agents would typically have a robotic body. Just. Should I have confused anybody?
Chris Benson
Lot of people exploring that.
Daniel Whitenack
But there are, there are, you know, just one of the things to note in terms of, you know, we're in this, the rise of agents right now, it's the hottest thing out there. But there are, you know, it's interesting, there are a lot of guardrail mechanisms that are out there. I know in the industry I work in, in defense, there Are especially in things like, you know, weapon systems and stuff like that. The DOD has guardrails around such things. So if you're listening and aren't familiar with that, but are a little bit worried about the. It's fortunately there are people thinking along these lines.
Chris Benson
Yeah, and. And there are, I would say, useful agents at this point, just not kind of in that fully autonomous, correct kind of setting. So AI systems that can connect to multiple things and maybe are used triggered by a human to do certain things. Those are the most successful that. That I've seen.
Daniel Whitenack
Absolutely.
Chris Benson
Number two from me, Chris. So we've got autonomous agents. Number two for me was time series forecasting or really any sort of prediction mechanism. So whether that's predicting future stock prices or reasoning over series of data making predictions, there's some level of prediction that these models can do somewhat. Well, in terms of maybe it's things like general text classification. Right. Is this message spam or not spam? And you can give some examples and you could get some reasonable output from a model like that. That's why I kind of honed in on time series forecasting specifically because at least as far as I know, and I know that there's research in this area kind of using transformer models for time series forecasting. But when I think of Genai, I think of I'm going to log into ChatGPT or I'm going to use deep SEQ or one of these models. And you know, if you paste in a bunch of time series data and try to create a forecast just with the Genai model and nothing else, then I think that's going to end again in sadness for you. It's not going to work so well.
Daniel Whitenack
Yeah, I think so. I actually had that on my list too in the form of high stakes financial trading.
Chris Benson
High stakes financial trading.
Daniel Whitenack
Where do you want to put your million dollars today and see where it goes? So maybe explore some of the possibilities there. But I don't think I would leave it to an agent to forecast or make that prediction on its own.
Chris Benson
Yeah, I think people have shown basically that these models definitely don't have the kind of world understanding real world grounding to make certain reasoning or take certain steps and reasoning to make reasonable predictions. But also they're really bad, generally really bad with numbers. And so you may be able to, even with a vision model, paste in a graph of a time series. Right. And say, you know, what month was my highest sales? If it's a graph of sales. Right. And a vision model could reasonably return that Value to you. Right. But then if you say, well now model out my sales for the next four quarters or something like that, I think generally that's not going to work so well. I guess you could argue that a model could generate code that might use packages, you know, forecasting packages to actually make a reasonable, a reasonable forecast over certain data. Then, you know, my general question then would be, well, that might be useful to generate your code to do it, but really it's not Genai that's doing that, it's the stats models in Python or.
Daniel Whitenack
That's right.
Chris Benson
Or profit from Meta and that sort of thing.
Daniel Whitenack
Yeah. And just in case that confuses anyone, there's the generative AI portion which is trained on a general data set, and then there's these models that it might be generating code to access which are designed specifically for that function. So those are two different things.
Chris Benson
Yeah, the code that ends up being executed is not having anything to do with Genai, basically. Yeah. And maybe it would be worth highlighting in each of these cases that we talk about, Chris, some interesting tooling for some of these things. You know, in the autonomous agents case, certainly workflows and automations can be created and executed. You know, we had Prefect on the show, which is a workflow orchestrator that can be monitored and handle retries and all of that. That's a great thing if you're looking at kind of workflows and orchestration time series forecasting. My go to has usually been Facebook or Meta's profit package which you know, makes certain things pretty easy. But there's, there's a, also many choices for that as well. So, so take a look through those things if you're interested in the non gen AI side.
Domo Representative
Well, friends, AI is transforming how we do business, but we need AI solutions that are not only ambitious, but practical and adaptable too. That's where Domo's AI and data products platform comes into play. It's built for the challenges of today's AI landscape. With Domo, you and your team can channel AI and data into innovative uses that deliver measurable impact. While many companies focus on near applications or single model solutions, Domo's all in one platform is more robust with trustworthy AI results without having to overhaul your entire data infrastructure. Secure AI agents that connect, prepare and automate your workflows, helping you and your team to gain insights, receive alerts and act with ease through guided apps tailored to your role and the flexibility to choose which AI models you want to use. So Domo goes beyond productivity, is designed to transform your processes, helping you make smarter and faster decisions that drive real growth. And it's all powered by Domo's trust, flexibility, and years of expertise in data and AI innovation. And of course, the best companies rely on Domo to make smarter decisions. See how Domo can unlock your data's full potential. Learn more@AI.domo.com that's AI.domo.com.
Chris Benson
All right, Chris, on to number three. My third one was do not use Gen AI to do complete code rewrites or the complete development of your applications, your software applications. Thoughts?
Daniel Whitenack
Oh, I've tried that just playing around and I definitely don't think that that's ready for primetime, despite the fact that as we sit here and say this, there have been quite a few CEO luminaries out there who have been advocating that over the last year or so. And when I sit down and try to do that, I get varying results. And it depends largely on how mainstream a language is, for instance, on how good it is. But I haven't gotten anything that I would say is a production grade program fully functional through nothing but generative AI. Just toy programs.
Chris Benson
Yeah. Without interaction.
Daniel Whitenack
Right, right.
Chris Benson
Yeah. I know this is advancing quickly, so who knows how dated this conversation will be in a few months. But I think we've been talking about this for some time now and we've seen things like Devin and Cursor and these sorts of things come out which are pretty amazing and do a lot of really interesting things but often don't kind of provide that full like I'm going to prompt and get a software application out of it. There is, there's more to it than that. So I think sometimes people are maybe a bit disillusioned and you know a better way to think about this or there are amazing kind of agents and toolings come out like the Devin, Cursor, all hands, windsurf, etc. That can provide a huge acceleration in your code development. I think if you treat them like code assistants and you know, maybe even junior developers that you are pairing with. Right. So it's not so much that I'm just now a complete non developer, right. I have no technical skills and I just say I want this application and it is generated. For me that's really what I'm meaning when I say kind of complete app development. So Gen, from my perspective is not capable of that right now or you should not rely on it for that right now. There may be interesting demos and cases where some form of that is shown, but for the Most part, I think thinking of the technology integrated into your code, code and programming as a assistant and even a highly functioning agent that you can pair with is a good model. Just not the kind of, I guess maybe it's a specialization of the autonomous agent thing that I mentioned before, sort of.
Daniel Whitenack
And I think you're making really good points in that. You can't just toss it over the wall and just say here's an instruction, do it all and generate kind of a complex set of programs and stuff. You know, I have done tasking small things very successfully but the scope of what they were addressing was constrained. And I think we are there for things like that and doing small bits. It's not uncommon for me to generate. Many years ago I would write VBA code, Visual Basic for applications for Microsoft stuff. I don't much anymore. And so now I can do something like that if I happen to be working for something in office to do, you know, put, put something together at work. But when I'm actually coding up a large project, I've not been, I've. It's very helpful to have different tools on this, but I've not found one yet that I was able to successfully do a significant coding effort by itself, just tossing it over the wall. So I agree with you completely. It will be interesting to see where we are a year from now, two years from now.
Chris Benson
Yeah, well, definitely. I would encourage people to check out things like Windsurf and Devin and all hands and cursor and all of these things. Super cool, try them out. But don't expect that if you're not a programmer or have at least some minimal level of skill that you're going to create a huge application or project with all of its intricacies and have that work and scale.
Daniel Whitenack
Well, fair enough.
Chris Benson
All right, Chris, what are we on? Number four for me on the list of don't do this with gen AI or bad gen AI use cases for me is anything extremely high throughput, low latency. So of course small models and very high throughput advances have taken place with Genai models. But still, you know, if you're doing quality assessment of products coming off of a actual scaled up manufacturing line where you have to do maybe the assessment of each of those products in a fraction of a second, really, you don't want to be reasoning over that data with a Genai model and take 10 seconds to generate your quality assessment for the product. It's just not feasible.
Daniel Whitenack
I would agree with that. And I actually have a subset that I'll throw in on that, that I think kind of fits in there, which would be kind of like real time applications with critical outcomes.
Chris Benson
That's a great way to phrase it.
Daniel Whitenack
I think that that's an area that you would. You may have generative AI as a component in that mix, but you're going to have to have some guardrails around it and you're going to have to have some specialized models to keep things on track. Because in a real time app where things matter on the tail end, you're great to use, but you don't want to rely entirely on that when it goes off the rails. You need some way to catch it that doesn't take any time.
Chris Benson
And I think you make a couple of great points. Part of it is around the latency which I kind of highlighted. These models just don't operate fast enough and they don't operate in the types of environments necessarily that you need them to operate in for these type of maybe edge use cases as well in many cases. But also these models perform or they do what they are supposed to do most of the time. Right, but still, if you, if you train a computer vision model, for example, to do that manufacturing task that could run on CPU, extremely high throughput and have a much higher accuracy than any, you know, generalized vision model out there, even that would need a GPU to run. Right, I agree with that. Yeah. So it's just not. What is that the separation between those two cases is still just really, really high in terms of those kind of use cases merging. Now I do think that in a manufacturing scenario, right, there's a great, or any of these sort of other cases that you might think of. High throughput, critical type of scenarios. Genai is very useful, maybe just not for that high throughput, low latency piece, but certainly for staff at the manufacturing facility that want to look at and analyze the data coming off of the quality assessment system and ask questions about, hey, you know, I see this alert, pull this data for me to help me understand what's going on or are there any of these types of events that have happened in the past X time? And that query level side via natural language can be very powerful, for example, and there's many other things that you could do in those scenarios, but there, there is.
Daniel Whitenack
I'm, I'll extend this just a little bit as, as you know, my personal passion is in autonomous platforms, especially at massive scale swarming, things like that. And when you talk about that, one of the areas where I think Genai does play is exactly the equivalent of what you just said on the manufacturing and that's having a human in the loop or on the loop that's able to interact. And so you're using Gen to actually be able to enhance the communication between the human who is in control or on the loop and able to step in and not. But, but not so much in the other areas, especially considering that when you have lots of vehicles, and this could apply for lots of different use cases both in the commercial space and the military space where you have a lot of, a lot of different platforms or vehicles in communication which requires high throughput. But yeah, I think that the only space there that is a big one is, is in those interactions with the humans that are involved in that. For. For safety.
Chris Benson
Yeah, for sure. Well, I have one more, Chris, a last interesting bad use case for Genai. The one on my list was anything outside of the major languages of the world. So anything with any sort of linguistic diversity or cultural diversity, essentially the models of the modern Genai era maybe work well in the kind of top five to 10 languages of the world, but there's 7,000 spoken languages in the world, which means they basically don't work for any of the languages of the world except for a couple. And moreover, the kind of cultural context of the models is driven by mostly what has been gathered either from the Internet or, or by Western tech companies, maybe, you know, Chinese tech companies. But there's certainly a bias against kind of search certain cultural contexts and languages and you know, even if you think about vision or video models, I'm sure the same is true. Right. Because just certain things aren't represented there. So the reality is that it would be great if you could, you know, land anywhere in the world and Change your, your ChatGPT or, or whatever to help you interact in, you know, X country in Africa or Y country in Asia and have that work really well with whatever languages you might encounter. But I would say generally that's not. Not the case as of now, I think.
Daniel Whitenack
So I know you haven't mentioned it yourself, but longtime listeners who have been with us for years will know that you used to be in that space in a former professional life and know quite a bit about this topic that you've just brought up. So. Yeah, I agree. It's definitely. I don't think that's changed substantially over the last few years.
Chris Benson
Yeah, and even simple things that don't have a lot to do with, I mean, it has to do with Genai, but also has to do with the tooling around it. Right. In terms of even other scripts, in particular Arabic for example, which of course is a major language of the world, which to some degree models can do reasonably well. At at least some models, the tooling around the gen AI ecosystem, right, like oh, I want to download this chat SDK or this UI that I can plug in a custom model to, this is likely not going to support kind of right to left. Potentially there's going to be some issues, you know, with the script and other things. So it's just kind of another highlight of this disparity that exists and it exists and I think is worth highlighting because mostly what we're talking about here is language models and really language models that support a very small amount of the languages on, on the planet. So yeah, but that's what I had. Chris, any thoughts after going through, through the list of bad.
Daniel Whitenack
I think, you know, I do have a few thoughts there. I think one of the things that I've noticed there is that there are kind of high risk and high and like where you have significant outcomes that can affect people in a, in a major way and whether it be financial or manufacturing or you know, my industry with defense or whatever, you know, you don't want to put a general, a general generative AI model in charge of doing things for which there are no guardrails. I think that that is a, a thing that I have noticed across a lot and I could throw out a couple of other areas where I think that applies, like things like high stakes legal advice. Do you have a great tooling within things like ChatGPT and the other big language models for legal advice? Yeah, but would you really want to, you know, literally put your life savings at risk with things like that? Maybe not. Maybe not. Today at least you see a lot of this, you see a lot of AI pervading medical diagnosis and once again, I think there's a very good use for those, but probably not by itself, you know, in isolation. So any of these areas where you have a substantial risk in the outcome in terms of good and bad, you probably want to have guardrails around it across many, many different industries. And that's, I think that's my takeaway. And you know, I think that things are continuing to improve at a really, really rapid pace. And we've said things and had, you know, two months later had the world change out from under us and that may happen again here with some of these. But yeah, it's, we're on the learning curve with these things and they're getting better, but they're not all the way there yet.
Chris Benson
Yeah, I think that's a great way to summarize. Chris, thanks for. Thanks for chatting through the things with me, and we'll look forward to carrying on the conversation very soon with you.
Daniel Whitenack
Sounds good.
Podcast Host/Announcer
All right. That is our show for this week. If you haven't checked out our Changelog newsletter, head to changelog.com news. There you'll find 29 reasons. Yes, 29 reasons why you should subscribe. I'll tell you reason number 17, you might actually start looking forward to Mondays.
Daniel Whitenack
Sounds like somebody's got a case of the Mondays.
Podcast Host/Announcer
28 more reasons are waiting for you@changelog.com news. Thanks again to our partners at fly IO to break master Cylinder for the Beats, and to you for listening. That is all for now, but we'll talk to you again next time.
Practical AI – Feb 24, 2025
Hosts: Chris Benson (Principal AI Research Engineer, Lockheed Martin), Daniel Whitenack (CEO, Prediction Guard)
In this episode of Practical AI, Chris Benson and Daniel Whitenack dive into the "cold side" of generative AI, focusing on situations where GenAI is overhyped, misapplied, or simply not ready for real-world productivity. They share a series of "hot takes" on misuse cases and discuss why, despite the buzz around generative AI, there are critical limitations – both technical and practical – that make certain applications unreliable or risky. With a candid and insightful approach, they turn their years of practical AI experience into a must-listen cautionary guide for anyone considering deploying GenAI.
[02:00–09:10]
[09:11–13:43]
[15:04–18:05]
[19:40–23:16]
[24:17–27:43]
"Currently and for some time, [autonomous agents] are generally a source of sadness for people when they try to create them."
— Chris Benson [02:36]
"In the imagination, it would be great to think of just letting that run in the background and you getting sales all the time, but it just doesn’t really work very well. There’s a lot of fragility in that type of system.”
— Daniel Whitenack [07:00]
"If you paste in a bunch of time series data and try to create a forecast just with the GenAI model and nothing else, then I think that’s going to end again in sadness for you."
— Chris Benson [09:38]
"I haven’t gotten anything that I would say is a production grade program fully functional through nothing but generative AI. Just toy programs."
— Daniel Whitenack [15:21]
"You don’t want to put a general generative AI model in charge of doing things for which there are no guardrails."
— Daniel Whitenack [27:43]
Episode takeaway:
Generative AI is a powerful tool, but it’s not a panacea. Know its limits, find productive integrations, and always supplement it with robust processes and human judgment—especially when outcomes matter most.