Summary7 min read

Practical AI Podcast

Episode: "AI at the Edge is a different operating environment"

Date: March 25, 2026
Host: Daniel Lightnack & Chris Benson
Guest: Brandon Shibley (Edge AI Solutions Engineering lead, Edge Impulse/Qualcomm)

Episode Overview

This episode dives deep into the current state and unique challenges of deploying artificial intelligence at the edge—in devices and environments outside the traditional cloud or data center. Guests and hosts explore the evolving definition of "the edge," advances in small and efficient AI models, the increasing practicality of edge computing, toolsets and hardware for developers, real-world use case considerations, and future directions. Brandon Shibley shares specialized insight as a leader in edge AI, blending technical depth with pragmatic advice.

Key Discussion Points & Insights

1. Defining AI at the Edge (02:27)

"The edge" broadly includes anything not in the cloud. It encompasses “far edge,” “near edge,” “edge of network,” and any device or system embedded near where real-world data is sensed or acted upon ([02:27], Brandon).
Past and present definitions vary, but the key is proximity to the source of data and distance from centralized computation.

2. Trends and Shifts in Edge AI Models (05:45)

Explosion in both directions:
- Models are getting bigger in the cloud, and smaller (but more capable) at the edge.
- Smaller LLMs (SLMs), often in the range of single-digit to tens of billions of parameters, can now run on smart edge appliances with powerful modern NPUs or GPUs ([05:45], Brandon).
Specialization at the edge: Smaller models are most effective when fine-tuned for domain-specific tasks rather than general knowledge ([07:40], Brandon).
Model composition: Increasingly, edge solutions use ensembles or cascades of specialized models, balancing capability and resource efficiency.

3. Unique Operating Constraints of the Edge (09:17)

Constraints:
- Size, power, cost, and (un)reliable connectivity are acute challenges ([09:17], Brandon).
- Privacy is both a challenge and an opportunity: keeping sensitive data close to its origin prevents exposure to cloud or Internet ([10:30], Brandon).
- Latency and deterministic performance are critical for applications like robotics or manufacturing.

Quote:
"These constraints are what we have to live and die by at the edge. Size, power, connectivity... we're also dealing with cost constraints."
— Brandon ([09:17])

4. Physical AI vs. Edge AI (12:21)

Physical AI often refers to systems that not only sense/predict but also take action in the physical world (e.g., robotics, vehicles).
Overlap is large: Physical AI is a subset of edge AI but brings in the "actuation" element ([12:21], Brandon).

5. Latency and Real-Time Demands (14:21)

Response time ("real-time") must fit the task:
- Microseconds in manufacturing or driving
- Milliseconds in robotics
- Seconds for chat interfaces
Deciding where models run depends on the necessary latency, available compute, and communications realities ([14:21], Brandon).

6. Model Cascades and Pipelines (17:26)

Edge solutions often use multi-stage pipelines:
- Lightweight, efficient models (like object detectors, e.g., YOLO) run first, filtering out most data.
- More complex or heavy models (like VLMs or LLMs) engage only when something interesting is detected, saving power and compute.
Example: Object detector filters frames; only frames with objects pass to a vision-language model for deeper analysis. This pattern applies to audio and sensor data, too ([17:26], Brandon).

Quote:
"What we'll do in many cases is have this pipeline or cascade where on the front end is some kind of very initial detection that can be done very efficiently... and then when we see an object that looks of interest... we can do much deeper or more dynamic analysis."
— Brandon ([17:26])

7. Advances in Tooling and Frameworks (21:36)

State of tooling: Edge Impulse is highlighted as an abstraction layer that enables developers to work with diversified hardware, managing data, model training, optimization, and deployment easily.
Fragmentation at the edge (unlike cloud, where Nvidia dominates) requires more sophisticated tools to support many hardware targets.
Portability and optimization are essential—Edge Impulse offers target-aware conversion for different chips ([21:36], Brandon).

8. Agency, Autonomy, and MLOps at the Edge (24:22)

Agency at the edge: More systems act and plan, not just infer ([24:22], Chris).
ML Ops & model management: Edge deployments require not only runtime efficiency but solutions for data acquisition, drift adaptation, model updates (often over intermittent connectivity), and control of distributed devices ([24:22]–[26:54], Brandon).
Over-the-air updates and centralized aggregation are best where possible for governance.

9. Progress in Small/Medium Models & Their Effectiveness (28:13)

Techniques like knowledge distillation and fine-tuning extract specialized knowledge into smaller, more efficient models tailored for edge tasks ([29:38], Brandon).
TinyML and ultra-low power microcontrollers continue to make specialized ML possible for even wearable devices.

10. Hardware Evolution and Vertically Integrated Approaches (31:44)

Edge Impulse's opinionated approach: Abstraction + target-optimized deployment supports edge diversity and leverages Qualcomm NPUs for efficiency ([32:45], Brandon).
Hardware advances: Specialized processors (NPUs, DSPs, ISPs) dramatically improve operations/watt, enabling richer on-device intelligence ([36:46], Brandon).

Quote:
"What we previously been able to do, we can just keep building on... Once you have sort of AI in the tool chest, you kind of just broadens the perspective of like what could the world be like if we put intelligence right where the data's at?"
— Brandon ([36:46])

11. Accessible Getting Started Advice for Developers (40:09)

Start with problems you care about: Home automation, custom sensors, pet feeders, etc.
Low-cost maker hardware is widely available (e.g., Arduino). Free tools like Edge Impulse help rapidly prototype real-world edge AI projects.
**Enterprises often begin with maker setups before scaling to robust hardware ([40:09], Brandon).

Quote:
"It's amazing that so many of these things are readily achievable with commodity like maker hardware that's out there. That's a great place to start."
— Brandon ([40:09])

12. Looking Ahead: The Edge’s Future (43:35)

Vision: As power, compute, and cost constraints shrink, intelligence could be embedded everywhere—much like biological intelligence is co-located with sensors in living organisms ([43:35], Brandon).
Anticipate more robotics, intelligent action models, and “edge-native” insight/actions transforming both mundane and extraordinary facets of life.

Quote:
"What if power and cost and compute, they basically kind of go to almost zero... it means that we could put intelligence literally anywhere right at the edge... What I see is we're going to continue bringing models to the edge, more of them."
— Brandon ([43:35])

Notable Quotes & Memorable Moments

On edge constraints:
“These constraints are what we have to live and die by at the edge. Size, power, connectivity... cost constraints.” — Brandon ([09:17])
On real-time needs:
“The application really drives home the requirement. It all comes down to what is the requirement for the type of behavior we're trying to get out of the system.” — Brandon ([14:21])
On cascades of models:
“If you were using a large language model... and running it continuously on every frame that came through, it's a very quick way to burn through a lot of power.” — Brandon ([17:26])
On developer accessibility:
“It's amazing that so many of these things are readily achievable with commodity like maker hardware that's out there. That's a great place to start.” — Brandon ([40:09])
On the edge’s future:
“What could the world be like if we put intelligence right where the data's at?” — Brandon ([36:46])

Timestamps of Key Segments

Intro & State of Edge AI: 00:41–04:38
Edge Model Trends (SLMs, LLMs): 05:45–08:30
Edge Constraints & Opportunities: 09:17–11:51
Physical AI vs Edge AI: 11:51–13:21
Latency & Location of AI: 13:21–15:57
Model Cascades/Pipelines: 17:26–21:36
Tooling/Frameworks: 21:36–24:22
Agency, MLOps at the Edge: 24:22–28:13
Small Model Techniques: 28:13–31:44
Edge Impulse & Hardware: 31:44–36:46
Battery Power & Capability: 36:46–39:01
Getting Started in Edge AI: 40:09–42:16
Looking Ahead/Future Vision: 43:35–45:40

Final Reflections

The episode showcases how edge computing is now poised to deliver on the promise of real-world AI: making systems smarter, faster, and more private by operating close to where data originates and actions are taken. With maturing hardware, advanced tooling, and developer accessibility, the barrier to entry is lower than ever—enabling both enthusiasts and enterprises to innovate “at the edge.” The future points toward intelligence everywhere—not just in the cloud, but ambiently embedded across the physical world.

For further exploration or to get started, check out EdgeImpulse.com and experiment with affordable hardware like Arduino as encouraged by the guests.

Loading summary

Transcript39 lines

[00:02]
A
Welcome to the Practical AI Podcast where we break down the real world applications of artificial intelligence and how it's shaping the way we live, work and create. Our goal is to help make AI technology practical, productive and accessible to everyone. Whether you're a developer, business leader, or just curious about the tech behind the buzz, you're in the right place. Be sure to connect with us on LinkedIn X or Bluesky to stay up to date with episode drops, behind the scenes content and a insights. You can learn more at PracticalAI FM. Now on to the show.
[00:42]
B
Welcome to another episode of the Practical AI Podcast. This is Daniel Lightnack. I am CEO at Prediction Guard and I'm joined as always by my co host Chris Benson, who is a principal AI and autonomy Research engineer. How you doing Chris?
[00:57]
C
Hey, doing great, you know, looking forward to another show. And like always, we're getting really edgy out there in the AI topics, aren't you?
[01:05]
B
I'm definitely on the edge of my seat for this discussion. I've been thinking about it a lot because today we have with us Brandon Shibley, who is the Edge AI Solutions Engineering lead at Edge Impulse, which is a Qualcomm company. Welcome Brandon. How you doing?
[01:24]
D
Doing great. It's an honor to be here. Been a fan of the podcast, so it's great to join.
[01:29]
B
That's great to hear. It's always a good connection to make.
[01:33]
C
Any, any thanks for putting up with our terrible puns here as we, as we start the show off. We're famous for terrible puns.
[01:39]
D
Yeah, I'm here for it.
[01:41]
B
Nice. Nice. Well, it's been a while since we've had a full episode talking about Edge AI or AI at the edge or machine learning at the edge or however, whatever combination of things you, you want to make. I'm wondering if you could just give us a little bit of an update or a kind of State of Edge AI or At the Edge in 2026, maybe highlighting first what what does the edge mean in, in 2026? And then maybe ways. If there are different ways in which AI is being applied at the edge than it has maybe traditionally been applied in previous, previous years or, or eras, if you will.
[02:28]
D
Sure. So allow me to start with the definition of edge. I take a pretty broad view of the edge and practically speaking in my mind it's anything that is not in the cloud. Now, depending on who you ask, they have far more specific definitions and we get into like far edge, near edge, edge of network and all of these things. In my world, we deal with all of it. So you know, edge just means we're taking AI, we're going to embed it somewhere that's not in a data center, not in the cloud, but usually close to the real world where real data is captured and where the sensors are to, you know, where is it going at the edge? The good news is with the, you know, with everything that's going on with AI, we're seeing a lot of innovation around silicon that's enabling us to embed models at the edge with greater efficiency, more capability. And so we're seeing that, you know, the, the industry is adapting to the needs of AI as we're going to bring it into the real world. And so that's very exciting. We're seeing also some other, you know, we call it pressures or trends. You know, economically speaking, tons of money has been going into AI research. At the same time, you know, the economy is also putting pressure to, you know, to achieve productive outcomes. You know, an ROA on that investment. That pressure's always been there at the edge, by the way. So I think what that means is there is a rationalization that's, that's actually pretty healthy, ensuring that when we apply AI that you know, it is doing something productive and ultimately achieving some kind of return on investment. So that's a lot of what I end up collaborating with companies on is really understanding what it means to achieve a positive outcome for them and then we can discuss like the technical methods on which we're going to get there.
[04:38]
B
And would you characterize, I mean, I know the last few, three or four years have been dominated by certain types of models, specifically generative AI models. And many people thinking of course of these large models, I mean it's even in the name large, large language model. And, and those, you know, I, I guess those models people might not think of as living kind of in the, in the physical world or, or at the edge. Is that, is that a fair assumption or. I, I know we've seen kind of people talking about slm, small language models. How has that shifted over time? I guess, you know, if we look back five to 10 years, the types of models that were being run maybe in disconnected environments or on a factory floor or at the edge in some sense versus now. Has there also been that shift as the market has moved to kind of these gen AI tools?
[05:46]
D
Yeah, absolutely. I mean language models are still relatively new phenomenon. Right? But they, in the last couple of years we've seen them kind of explode in different directions. They're getting bigger in the cloud, they're getting smaller at the edge and that's a good thing. It means that, you know, there's a broader range of possibilities to solve problems with. So we're going to have trillion plus parameter models that you can never practically fit at the edge that are going to be in a data center somewhere. And then we're going to have much smaller versions of LLMs, even SLMs that we can embed into devices. So, you know, edge devices are growing to accommodate those small to mid size LLMs. We're talking on the order of, you know, single digit to tens of billions of parameters can be accommodated in some form of edge hardware, even edge AI appliances. These are things that have, you know, let's say 64, 128 gigabytes of memory, for example. They have powerful NPUs or GPUs to be able to do the inference and they can be embedded into a premise or even into vehicles or things like that to, to accommodate these kinds of models. Now the smaller size of those models makes them, you know, the implications of, of them being small mean that they don't have quite the same kind of like knowledge capacity as we call it. It's kind of a ro. The idea, meaning is they're not going to be necessarily great for retaining tons of like real world knowledge, but where they shine is where they're specialized and fine tuned for specific specialized, you know, data. And so there, I think this is what the industry is beginning to become more effective at, is achieving that. That means doing more with less essentially. And it's not just slms, it's all kinds of AI models where we see. I personally work with a lot of other kinds of neural networks as well. And there it's always been about, you know, curating data sets, training those specialized models for special specialized needs. And if anything, what we're seeing now is a lot more of combining these models into really interesting, you know, cascades or ensembles of models in order to leverage sort of the best of all of them. And at the edge we really have to remain pretty lean. So it means in many cases what we're doing is a combination of different lean models to get exactly the characteristics we need to solve problems at the edge.
[08:31]
C
I guess one of the things that I'd like to take a moment and maybe back up a little bit and as we've kind of dived into smaller models at the edge, but maybe for listeners who have not had experience themselves at and operating at the edge, could we talk a little bit about or could you kind of Explain a little bit about some of the characteristics that you find at the edge that make it kind of a distinct operating environment that you're having to cater to in terms of security, latency, comms between things, but just kind of like the whole set of characteristics that makes it very distinct from the cloud environment. Because I think most people probably listening out there that have done stuff have been operating in cloud environments instead of this.
[09:18]
D
Absolutely. I mean, this is the key point. I'm glad you brought it up. Because these constraints are what we have to live and die by at the edge. So what are those constraints? Size, power, connectivity, which may or may not be there or be reliable. We're dealing also with cost constraints at the edge, as I mentioned. It's, you know, many of these products have to be, you know, sold into very cost sensitive markets. That plays a huge factor. Reliability may be key latency in the case where we're dealing with, let's say robots or anything that's got to take immediate action in the real world based on the data that it's collecting and then also privacy. So users are going to be, in many cases we're talking about systems used by people and the kind of data that's being captured with cameras and microphones and other sensors is sensitive data that, that should be kept private. And so that's, you know, another element we're often dealing with. In fact, a lot of these are, you know, you can think of it almost like double edge issues. Both the challenge that you face at the edge, but it's also the opportunity. Privacy is a good example of that. Edge is an opportunity to keep that private data at the edge and not proliferate it out onto the Internet and into the cloud and in places where, you know, users would prefer it not go. And so yeah, the, to contrast that with the cloud, obviously we've got far fewer constraints around power, around compute. You know, usually things are computed in the cloud where latency is less of an issue, although it still may be important. But yeah, the pressure to bring things to the edge is often driven by things like latency, privacy. And you know, there's in general the economics as well, because you're thinking about where is it most efficient to do computation. You'd want to do it near the data. I mean, otherwise we need connectivity. We're going to be paying for cloud services. A lot of these systems already have some compute like at the edge and a lot of times it's underutilized. Right. It's being used to do certain things. But if You've already got compute there. You can also use it to do a lot of your data computation. And AI, you know, there's a lot of economical, you know, economic benefits of leveraging the compute at the edge rather than have to, you know, pay a lot to compute at scale in the cloud.
[11:51]
B
And how does this, we've talked about sort of real people using this technology in the kind of physical, real world, maybe not at their computer screen. I know one of the big topics kind of coming into this year that I even, I just saw LinkedIn post about is physical AI. How does that jargon kind of overlap with edge AI or relate to it? Maybe for people that are kind of trying to parse through some of the hype and the, and the jargon.
[12:22]
D
Yeah, it's difficult because there's some jargon and buzzwords. And I think in some way edge AI, physical AI can be a buzzword, but it's also referring to a real, you know, use case and phenomenon, which is that we can put AI out in the real world. And in the case of physical AI, I would say it just sometimes is distinguished from edge AI in that it really relates to taking physical action in the real world. Think about robotics or self driving vehicles. Not only are they sensing the world and making predictions about it, but then they're also translating those predictions into taking action at the edge. So I would say, you know, if there is a distinction, that's generally what the distinction is between edge and physical, but there's also a ton of overlap. Obviously any physical AI is essentially also about sensing and making predictions of the data that's, that's out there in the real world.
[13:21]
B
And could you describe a little bit. I know there's so many people and you know, many developers that are probably listening to this show that have primarily interacted with AI through API endpoints, you know, over, over the Internet. Those seem fairly fast in, in many cases and there might be some people thinking out there, you know, oh, well now we have Starlink and we have these, these endpoints. What are you talking about? With, with latency or, or these sorts of things. Could you just kind of drive home on that point and maybe with, with you know, theoretical examples or, or something illustrating how, how sometimes that's not a, not an assumption that can be made, that kind of connectivity and then maybe tie that to. Tie that to what might need to be run at the edge in order to not operate at that model or in that kind of API endpoint model?
[14:21]
D
Yeah, absolutely. When we talk about real time performance it means that we need some kind of response or output of a system within a certain timeframe. Now what that is is not, you know, it depends very much on the application. So if we're talking about a high speed manufacturing that may be on the order of microseconds. Take a self driving car, again, maybe it's microseconds or single digit milliseconds. If it's a chat app where I'm chatting to an agent and I need a response, it might be on the order of mini microseconds or even seconds is acceptable for latency. So the application really drives home the requirement. And so it all comes down to what is the requirement for the type of behavior we're trying to get out of the system. Based on that we can make a decision about, you know, where should the computing be done, where should the models live? Is it acceptable to send that data over the Internet or not? Do we need to do it right at the sensor? Can we do it somewhere? Maybe on premise, but somewhere else on the network? Those are the kinds of things that we can determine based on those latency requirements. Yeah, again there's a wide range of different possibilities. You know, the great thing I think of AI, edge AI and you know, even cloud, we have many tools in the tool chest. You know, we need to kind of approach this from first principles, design thinking, which is what are we trying to accomplish at the end of the day? And then that will inform us about what tools we can use to get there.
[15:58]
C
You made a, you made a comment earlier as we were kind of, as we were getting into the description of that edge environment and you talked about out, I think I can quote you as cascades of models at the edge. And as we're, I think, you know, with most people, even outside of the industry itself, just people using, you know, Gemini, chatgpt, Claude, and they're kind of used to thinking of, I'm going to go to the AI, you know, that's the large language model that is going to solve whatever it is that I want to solve. And yet on the edge, as you've just kind of described, all these characteristics that are very common that teams have to address and you have lots of potentially different models coming into bear, and some of those are LLMs and some of them are small language. They're kind of moving from large language to small language. And some of them may have nothing to do with generative, it may be reinforcement learning in a lot of cases or other types of models that we've talked about. On the show. Can you talk a little bit about the relationships of having that cascade of models to the types of actions that you need to take, you know, the sensing and the actions that you need to take on platform when you're at the edge to kind of give a sense of, you know, the different architectural thinking that goes into these edge environments that way?
[17:27]
D
Yeah. So let me start by giving you an example which I think kind of makes clear why combining models and cascading them, or you can think of it as like a processing pipeline is a common pattern that you'll see here. So, you know, the thing about the edge environment is we're often compute constraints and we're also trying to minimize power in many cases, which means we don't want to just use the most powerful processing technique we have at all times. If you were using a large language model, or maybe a vision language model on camera data and running it continuously on every frame that came through, it's a very quick way to burn through a lot of power. So what we'll do in many cases is we have this pipeline or cascade where on the front end is some kind of very initial detection that can be done very efficiently. So maybe it's an object detector, maybe listeners are familiar with. YOLO is a common form of an object detector. It can be used to detect objects in the frame. And maybe we throw away, you know, 99% of the frames that ever come through based on this initial object detection. But then when we see an object that looks of interest, we can maybe use the bounding box that we've predicted around this object, crop the image out and then cascade it into something like a VLM where we can do maybe much deeper or more dynamic analysis on the image. And it can give us like much more detailed, you know, metadata about what's there. That's an example of where these cascades are useful. And we don't just use them for image processing, they get used for audio. Sometimes we're doing multi stage detection, sometimes we'll do initial detection, then detect, have other object detectors that can detect different features. So maybe you detect a vehicle and then what you want to do is once you've detected a vehicle, now I wanted to detect the license plate, maybe I want to detect certain features of the vehicle. And then based on that information, maybe I can perform some retrieval, augmented generation. For example, I'm looking up information from a database of documentation and then combining all that information to request a response from an LLM which will craft like a textual reply to a User, for example. So these are all the tools that we think about using when we're going to solve a real world problem and you know, trying to get the best possible performance, I mean, balancing many different constraints and also traits that we're trying to get in the solution that we built.
[20:12]
B
And I'm wondering, I'm having flashbacks to maybe, I don't know, like earlier on in my career where a lot of what I was doing was running kind of models next to data. A lot of, a lot of that due to just the size of the models and how we wanted to deploy them and that sort of thing. And I remember part of the trauma of not to people experience obviously, but I'm thinking of trying to get all the right dependencies to get TensorFlow to run with this particular model and debugging all of that kind of chain of things. What is maybe from another perspective, from the developer perspective, what taking a look at the kind of state of tooling around Edge AI now and like the ability you mentioned, the advance in hardware, which we can talk on here in a second, I'd love to kind of hear some of that in a little bit more detail. But just in terms of the tooling, what is the state of that? I'm guessing things have advanced and changed maybe, but yeah, how has that advanced. How has the kind of tool tool set and frameworks kind of advanced to support this? These kinds of pipelines?
[21:36]
D
Yeah. The good news is the industry has responded with options for tooling. And so, you know, I personally work for a company that builds a platform with this kind of tooling called Edge Impulse. So it is a way of being able to easily work with data, train models, tune and optimize them for target devices and then generate a deployment that's easy to run on a device. That's the kind of state of the art in terms of simplifying this development. Of course there are frameworks below that things like TensorFlow as you mentioned and others. Those are also many machine learning developers work directly with these frameworks. But I would say the difference is, you know, people that are. You've seen this in software forever, right? Abstraction layers. There are people that specialize at different layers of this stack and to reach the general developer, somebody who's not necessarily an expert in TensorFlow or these frameworks, they can leverage, you know, easy to use tools. They're out there and Edge Impulse is a great example of one that's specifically designed for the Edge and the fragmented hardware hardware ecosystem that's out there. The advantage of the cloud is really that there's kind of been largely some, what I want to say, almost unification around some common hardware. Right. Nvidia is obviously very dominant there. It means that most developers are using very similar tooling, targeting a very similar hardware target. And at the edge things are still very fragmented. So this is where using tools like Edge Impulse really does help developers and helps them make develop models that are highly portable, can still also be optimized for the specific features of the hardware
[23:37]
C
as well as you're talking about kind of the tooling there and recognizing that as we've moved from cloud to edge and that maybe the workflow is a little bit different, you're trying to develop systems that are planning and executing, you know, multiple tasks with some level of autonomy, you know, and the various support framework that has to go around that. Could you talk a little bit about what, you know, most people, you know, we're so used to hearing, you know, about inference in the cloud and stuff. And you still have that at the edge, but you hear the word agency a lot more when we get to the edge. And can you talk a little bit about kind of what that, what that workflow shift and that objective shift is like and how the tooling impacts that?
[24:23]
D
Yeah, in a lot of ways machine learning is. It's math and it's statistics and at that level it's very similar like cloud Edge, the similar concepts apply. The difference comes in generating efficient runtimes that are going to work on a processor at the edge versus, you know, a GPU and server in the cloud. And you know, also there are also other differences that I think are pretty important, like how do you get data from the edge? How do you continuously deploy newer and greater models? We talk about ML Ops as like a best practice here, which is just because you've deployed a model out into the real world doesn't mean that it's going to be, you know, it's always going to be good enough. The world changes, right? And sometimes we're also deploying things into new environments. Those models will need to be adapted and improved. And the way we do that is collect newer data from, you know, over time. We talked about this concept called drift, where the world changes for whatever reason and so the model will perform less well in these new environments. So we'll have to get new data, you know, train a new version of the model and redeploy it out. And so that's can be challenging in the physical world. These devices, they live out in the world where, you know, Connectivity may be an issue. The environments vary vastly. Unlike the cloud where you have very uniform environment, centrally managed, it's, it's highly distributed and chaotic out in the real world. So that is also one of the like, major factors that comes into play here.
[26:09]
B
And when, when you're thinking about, I guess, that distributed nature of the environments that you're working with, immediately my mind goes to sort of like complication and control. Like how do you, how do you govern and manage both the operational component of that and the governance component of, of that? What would have, what's been learned, I guess as some of the best practices and thought processes that goes into making sure that you, as you have more and more of a distributed set of things out there in the world, you have some concept of kind of control or governance or however you put, put that.
[26:55]
D
Yeah, where possible, where these devices are connected to the Internet, we still leverage, you know, that connectivity in order to manage the devices. That means that, you know, we're still centrally managing a lot of this in the cloud. We're obviously aggregating a lot of data to do training in the cloud, be able to generate models from data that's been captured from many different devices. That helps us train more generalized models than if we were to try to train a model on a per device basis. Right. Because each device has only got a small sliver of the total universe of data. By bringing all the data together, we can train models that are really more generalized to work broadly throughout the whole world. And the same goes for how we're going to manage deployments as well. So if we can bring that connectivity of those devices centrally, it means that we can also roll out new versions of the model in a controlled way, often using something like an over the air update framework as a way of helping manage not just the software on the device, but the models as well. So revision control, all these best practices that we have from software, and we at the edge have been dealing with that for quite some time. We can now apply also to the models that we're deploying to the edge.
[28:14]
C
As we're talking about models at the edge, one of the things that has definitely been very pronounced has been as we've moved to smaller models in terms of number of parameters over time. And you're kind of comparing like where we're at today with that and the advances there, which maybe a lot of folks aren't, you know, the general public still focusing very much on the frontier, you know, large language models out there, that's what they read about. Most of the time and in the news. And maybe this is one of those topics that kind of gets missed is the advances in smaller models. Can you talk a little bit about the fact of like, like what can you do, you know, now that we as we are doing this in 2026, discussing this and, and you know, you have some incredibly capable models that are small, that may have 3 billion parameters, you know, instead of many times that number of some of yesterday's large language models. Can you talk a little bit about why those smaller models have gotten so effective and what are the decisions that you have to make when you're using these small models for both for their strengths and their weaknesses so that you can kind of put them in an architecture that makes it work for the mission that you're trying to address in that architecture.
[29:39]
D
You're correct. I think like the what's happening with the state of the art is kind of overshadowing some of the, you know, also advancements that are happening at the edge and with small models. The good news is also a lot of that is applicable to what we're doing at the edge as well. And so one of those techniques that we use is knowledge distillation. So a way of leveraging big powerful models and being able to distill out the knowledge into a small model. And this is one of these techniques that allows us to achieve this. We don't need like the whole universe of knowledge into a small model that's meant to do something very specialized. We only need the knowledge that's relevant to that specialized thing. And so these knowledge distillation techniques mean that we can use big, large models, extract basically their knowledge through a lot of, you know, if it's a language model then we use a lot of like, you know, hit it with a bunch of queries where we can get a response train, a simpler model based on that. And there are other techniques we use as well, fine tuning as well. So you know, taking a model like this, fine tuning it specifically on the data that's going to work on in its specialized task. There's, there's a lot of those techniques and then of course there's non generative models too. So these classically have always been pretty purpose built on data sets that are targeting specialized use cases and enabling us to generate very small models. Edge impulse we've been able to, we work with a lot of wearable devices. So this is like the smallest of edge devices you can think of. Wearable rings for example, point into microcontrollers. We've always been able to do that using, you know, it's been coined tinyML, right. Small machine learning models. So, yeah, there's a whole spectrum of possibilities there, many techniques that are applied and yeah, I think it's great that we've been able to leverage, you know, the advances that continue to come in the frontier of AI.
[31:44]
B
And I know, Brandon, that you mentioned that Edge Impulse has their kind of own take on some of the framework and the tooling used to enable some edge AI. I'm also curious, you know, Edge Impulse now being a part of Qualcomm or a Qualcomm company, there's kind of a vertically integrated, I guess, component to that. I'm not going to put you on the spot to talk through why Qualcomm would want to acquire Edge Impulse or something like that, but could you talk a little bit about maybe first Edge Impulse's unique take on or opinionated take on how the tool set should look for enabling these kinds of workflows. And then maybe also if there's anything relevant to that kind of vertically integrated take on edge AI that kind of makes a vertically integrated approach maybe appealing in certain ways.
[32:46]
D
Absolutely, yeah. So Edge Impulse is the leading edge AI platform. It's also the reason why and really the goal for it, I mean, for it to be the leading platform, really had to deal with the diversity and the fragmentation of the silicon in this space and continues to do so. Our, you know, so like our opinionated take on how to serve that space has really been to try to, you know, I think of it as kind of a duality when it comes to hardware. Right. We're trying to like in some ways abstract away all the hardware differences. The machine learning is essentially math and statistics. And so we on some level want to treat it that way. Then when it comes time to deploy, we do target aware optimization and generation and conversion of the models for those targets. So by kind of thinking about it in those two different terms, we have this flexibility to go and serve the broad market. Now how we bring in and empower the processors and platforms from Qualcomm is we make sure that Qualcomm is of course supported best in class with this optimization and tuning and leveraging all the competencies that Qualcomm has. So that means extreme power efficiency and leveraging their accelerators. We're talking about the Hexagon npu, for example, that's in Dragon Wing processors used in many different use cases. Industrial and also things like automotive, everything from like very low power infrastructure out in the world up to very powerful, like I mentioned, AI appliances. These are like basically AI servers that go on prem. So it's a broad portfolio there that they, you know, awesome range of different silicon with. And it's not just the MPUs as well. It's DSPS, it's ISPs, it's a lot of specialized processing. So it's a lot that we can tap into and leverage in order to bring the most efficient models out to the edge device. Yeah, so I think in a lot of ways Edge Impulse hasn't had to change its opinion about the world. It's like we understand how we need to be able to go out and bring ML into the edge space and it means also being able to accentuate all the different silicon that we can serve with our platform.
[35:29]
C
I'm curious kind of to dive into, into the hardware again a little bit more because again, I think this is a little bit of a new topic for folks that are used to big servers that you're plugging in in a data center or cloud environment. All of these things are battery driven out at the edge, kind of by definition, especially if they're a moving platform. We're talking autonomous vehicles and stuff like that. When you're, when you're looking at trying to do that computation out there and you know, you have, you have neural processing units that have become quite, you know, quite advanced to your point about Qualcomm and you see that the number of operations per second per watt are really, have gotten pretty amazing in terms of what they can do. How has that, how has that changed the math or kind of changed the way you think about operations at the edge? When you're talking about different platforms that don't have traditional power available, that efficiency that you're having there, how has that kind of yielded new capability at the edge for battery powered devices?
[36:46]
D
Yeah, it certainly means we can do more. And so just the amount of ML that we can bring or the size of the models obviously allows us to scale out. And you know, that's very, it's just, it means that what we previously been able to do we can just keep building on in ways that allow us to do, bring more intelligence, more processing. So you know, I don't think it's anything more than that necessarily. It's just that extreme efficiency means like when developers are building a product, they're trying to differentiate, they're trying to do better than the last iteration of the product. They're able to get processors that are both very cost efficient Very power efficient. They have the compute so that they can go and deploy models or their software in a way that's going to give them best in class performance and that translates to being able to market their end product, you know, competitively or best in class relative to all their competitors. So yeah, that's the way I, I look at it. And you know, it's, there's always an opportunity to do more. I mean I think that's the way when once you have sort of AI in the, in the tool chest there's, you kind of just broadens the perspective of like what could the world be like if we put intelligence right where the data's at that suddenly the possibilities start to explode and that question is like, well, what's actually feasible with the devices that we have there or that we're going to put there in the next generation and so on. And it's usually power and cost constraint. So you know, that's how the calculation usually works out. I hear from folks all the time like have really interesting sometimes crazy ideas about what they want to achieve and that's exciting. But we also again are forced to rationalize a bit what brings real value to the end users of these products. What can like bring, you know, let's face it, revenue for the companies building these products. And that helps, you know, that's a forcing function for making sure that what we're building is actually valuable ultimately.
[39:02]
B
And some of that's really, I mean I get excited thinking about some of those possibilities also. I love the idea of kind of creativity that comes when you're working and constraints and you know, trying to, to work through some of those things as develop. If you think of developers or kind of AI practitioners out there, do you have any recommendations for the person who is maybe inspired by this conversation and says hey, I want to try an AI at the edge thing. What might be a way that they could, you know, not necessarily there's all sorts of use cases as you mentioned, but, but you know, create some type of lab environment or maybe it's Arduino or whatever that is, do something, you know, create a kind of minimal setup that would help them explore and experiment with some of these edge AI things. Where should they start? Both on the, maybe on the hardware side and the software or kind of use case tooling side. What's a good starting point and, and how can they get going?
[40:10]
D
Yeah, what I think is awesome about edge AI is you can honestly think about any real world problem out there and start to think about how can I go and solve it with AI and we can put these processors anywhere now. So that's the first place to start. Is there something interesting or a pain that somebody's dealing with? There's so many cool projects that people have built around their homes because they're like, there's not something that does this for me. And so I can take a simple board, maybe it's from Arduino, you know, I can create a simple model in Edge Impulse. By the way, it's free to sign up. So in terms of tooling, you know, that's a great place to start and then go and solve your problem, right? Like, do you. Does your basement leak? And you want to know when it leaks? Create a leak detector, super easy. Do you want to detect when your cat walks by so you can dispense some food from your cat feeder? You can do that too. It's amazing that so many of these things are readily achievable with commodity like maker hardware that's out there. That's a great place to start. And we see this even in enterprises. Like, you know, I've been a developer in enterprise and I've known many of them. They also use a lot of this, like, stuff to get started. And it's a easy way to generate a proof of concept. And once you've got a working example, then of course you go get some real enterprise hardware. Qualcomm's got a lot of it, so definitely check it out. And you know, you've got tools like Edge Impulse, which also scale into production and can support you when you go up to, you know, serving models at scale with like true MLOps, continuous deployments and all of that as well. So, you know, long story short, I think there are some great examples. I'll stick with Arduino here. Great options for getting people started on these projects for very inexpensive amount of Money. Check out EdgeImpulse.com, you can sign up for free, start using it. And there's also great content to help you get started using these tools.
[42:17]
C
That's a great answer, and I point out a very fun answer to go implement when you're actually bringing these capabilities into your own life, into your own world. Not just through your work or whatever or through your phone on an app, but actually having things happen that you said, I want to go do. So great answer. I guess as we are winding up and we're kind of looking at the future, you know, you have Edge Impulse and Qualcomm and the kinds of work you're doing and you also have the larger edge space and where is one of the things we like to ask, and you may have heard this on other episodes, is just like where do you think things are going? And the nature of the question is a little bit less structured in the sense of when you're not trying to solve a problem and you're just kind of letting your mind wander, what kinds of things do you think of that might come to pass? They might not, but they might come to pass. As you're looking at this industry you're in that excites you and you kind of go, that's where I really want to go. And I know that there are other people that would probably want that too. What are those kinds of thoughts that you have about where edge compute may be heading over the next few years?
[43:35]
D
Yeah, I think so. The way I think about it is like what if power and cost and compute, they basically kind of go to almost zero. Or like the cost of these things, right? It means that we could put intelligence literally anywhere right at the edge. And where we're at today with a lot of intelligence is it's kind of in the cloud. So it's gated by connectivity, you know, the cost of using the cloud and things like that. I think when you're, when you're at the edge. And I think it's also important to think about like biological intelligence, right? We've had these incredible organisms, right that have sensors, they have intelligence directly where the sensors are. And like we see the world around us that we managed to create with that. It's incredible. That's like just the most amazing inspiration. What if we can, can get closer to that with, with AI and so just the realm of possibilities enormous. It what I see is that we're going to continue bringing models to the edge more of them. We talked about cascades and things like that I think is how we. One of the techniques we go. And then there's also, you know, world models, VLAS and stuff on that spectrum as well where we're talking about, you know, very large models. And if those become more economical to the bringing the edge, it means that they bring real, more like true intelligence about the broader world and the ability to act in the world. So, you know, I also think that these action models are going to become more prevalent. We're seeing that robotics self driving but also many other places they could be applied. So you know, that's what I see. I think we're going to see a lot more robotics which is going to be exciting and interesting and hopefully like even, you know, maybe they're like robotic, like systems, but things that just live around us and can help take action in the world using intelligence.
[45:41]
B
It's awesome. Yeah, well, I'm certainly excited to see some of the those things and also I really encourage folks out there, you, you have no excuse to not go experiment and try some things with all the great hardware and tooling available to, you know, experiment with some of those things in your own, you know, that fit your own passions and your home environment or, or wherever that is. So really appreciate you coming on the show to, to inspire those things, Brandon, and the work that you're doing with Edge and Pulse. Appreciate that and, and hope to have you back on.
[46:15]
D
Yeah, it's been a real pleasure. Thank you for having me on.
[46:23]
A
All right, that's our show for this week. If you haven't checked out our website, head to PracticalAI FM and be sure to connect with us on LinkedIn, X or Blue Sky. You'll see us posting insights related to the latest AI developments and we would love for you to join the conversation. Thanks to our partner, Prediction Guard for providing operational support for the show. Check them out@prictions guard.com also thanks to Breakmaster Cylinder for the Beats and to you for listening. That's all for now, but you'll hear from us again next week.