Summary6 min read

Embracing Digital Transformation

Episode: Small Language Models: The Public Gen AI Killer?

Host: Dr. Darren Pulsipher

Guest: Lynn Kampf, Intel Corporation

Date: October 30, 2025

Episode Overview

This episode dives into a major trend shaping enterprise AI: the rise of Small Language Models (SLMs) and their potential to outpace public, generic generative AI in practical, scalable business applications—especially for the public sector. Dr. Darren Pulsipher and returning guest Lynn Kampf (Intel Corporation) break down the hype, challenges, and actionable ways for organizations to benefit from SLMs, focusing on efficiency, risk reduction, and real-world use cases.

Key Discussion Points & Insights

1. Enterprise AI Benchmarks and Token Confusion

Token Metrics Explained [(01:35–05:03)]
- Mainstream AI benchmarks (like tokens per second) are often misleading for business users.
- Lynn Kampf:
  
  “Tokens are a Greek supply side metric because it's very similar to megahertz. How quickly can you get bits out? ... But we're not really translating what that means to humans.” (02:12)
- High token throughput, like speed in a race car, isn’t always needed for human-facing use cases.
- Critical question: Does faster output provide real business value (e.g., in fraud detection) or just higher cost?
Analogy:
- “Ferraris are amazing on racetracks ... Ferraris that are driving through a school zone to get groceries, not very good. Think of human consumption as the school zone ... we don't operate at the speed of light, the machines do.” [(03:30)]

2. Cost, Infrastructure & Efficiency: Why SLMs Matter

Hidden Costs of High-Volume AI
- More tokens per second = higher compute costs and potentially specialized, expensive hardware [(05:08)].
- Most enterprise IT is only ~40% utilized, with the rest essentially “sitting idle for backup.”
- SLMs let organizations leverage existing infrastructure (CPU/GPU), minimizing disruption and cost.
Tailored, Vertical Models Add Value
- SLMs/vertical models are designed for specific industries, offering efficiency where general models (e.g., GPT-4) offer little extra value:
  
  “If your business is not needing to write haiku, input Shakespeare and create video images for fun, then you may not get value out of the biggest punches.” (Lynn Kampf, 06:48)
Strategic Start:
- Begin with small, impactful use cases (summarization, policy analysis, data sync) and scale as demand grows.

3. Pragmatic AI Adoption in the Enterprise

Where Most Fail [(09:08–10:22)]
- Refers to a recent MIT report: “95% of Gen AI business PoCs are failures.”
- Cause: Rushing to use AI “for the sake of AI,” poor data preparation, or not integrating with existing business data.
- Quote:
  
  “Most enterprises ... if it's a mature stable tech stack and it's got disaster recovery, you leave that sucker alone. You're not just going to go add new technology because it's new.” (Lynn Kampf, 09:26)
Avoiding Data Silos
- Implementing separate AI systems can create data silos and duplicative costs.
- Best Practice:
  
  “Prototype small and tight. Take advantage of the extra capacity you have and then really judge whether new hardware is necessary based on—will I get [more] if I'm running faster, or will I decrease my liability?” (10:53–12:41)

4. Concrete SLM Use Cases

Summarization & Knowledge Management [(12:56–14:56)]
- Example: Internal tools like “beta”—a searchable database of support calls and solutions.
  
  “Having something very quick that summarizes—is somebody else seeing this, what is the possible part?” (13:08)
- Call centers benefit from SLM-empowered search and summarization—driving efficiency and reducing operational costs.
Policy Analysis & Expense Auditing
- Internal chatbots surface travel or HR policies instantly.
- SLMs audit expense reports by finding “out of sorts” cases for human review.
  
  “It just helps the human not have to do as many actions.” (15:28)
Retaining the Human Element:
- SLMs and vertical models support decision-making but don't replace judgment.
  
  “AI is not going to be [a] good judge. AI is going to help very quickly collect information ... so that you can use judgment and critical thinking.” (16:07)

5. The Impact on Workforce & Skills

Upskilling for AI-Augmented Roles [(16:43–19:43)]
- Humans will focus less on “glue” work (manual data movement/interpretation) and more on judgment and recommendations.
- Dr. Pulsipher:
  
  “We as humans have to train ourselves more on critical thinking, art, mitigations. ... We've been relying too much on ... gathering all the data, I don't know what I'm supposed to show.” (18:10–18:44)
The New AI-Partnered Paradigm
- Employees become “AI augmented,” not replaced—maximizing business intelligence rather than repetitive clerical work.
- Kampf’s Quip:
  
  “Blind AI is do stupid stuff ... at the speed of light. So you just need that human [to] accelerate your judgment.” (19:43)

6. Architecture and Hybrid AI Environments

Hybrid Is the Future [(20:44–22:18)]
- SLMs and vertical models will run across cloud and on-prem, requiring resilient, adaptable architectures.
- Avoid “flat lift and shift” mistakes from early cloud adoption; apply lessons learned for cost and resilience.
  
  “If I architect inappropriately, then I'm not affected by outages. I'm not affected by data breaches.” (21:29)
Owning Your Output
- Using public chatbots may result in losing copyright—vertical, context-specific models keep knowledge in-house.

Notable Quotes & Memorable Moments

Ferrari Analogy (Tokens vs. Business Value):

“Ferraris are amazing on racetracks ... but not very good in a school zone ... we don't operate at the speed of light, the machines do.” (Lynn Kampf, 03:30)
On Failed GenAI Projects:

“95% of Gen AI business PoCs are failures.” (Dr. Pulsipher, referencing MIT, 09:08)
On Human Judgment:

“AI is not going to be [a] good judge. AI is going to help... get information so you can use judgment and critical thinking.” (Lynn Kampf, 16:07)
Human Upskilling:

“We as humans have to train ourselves more on critical thinking, art, mitigations.” (Dr. Pulsipher, 18:10)
AI-Augmented, Not AI-Replaced:

“AI augmented, right? Not AI replaced. If you value your employees and your company then you want more intelligence at the table.” (Dr. Pulsipher, 19:17)

Key Timestamps

| Time | Segment | |------------|----------------------------------------------------| | 01:35–05:03| Token confusion & performance vs. value | | 06:22–07:38| Using current infrastructure with SLMs | | 09:08–10:22| Why most GenAI business PoCs fail (MIT study) | | 12:56–14:56| Use case: Summarization and internal knowledge | | 15:24–16:39| Use case: Policy chatbots, expense audit | | 16:43–19:43| Skills for an AI-augmented workforce | | 20:44–22:18| Hybrid architecture for SLMs and vertical models |

Summary Takeaways

SLMs are a pragmatic, cost-effective way for enterprises (especially in the public sector) to leverage GenAI without the cost and risks of massive public models.
Start small, prototype, and align with concrete business needs—avoid creating new silos and focus on augmenting (not replacing) human intelligence.
Upskill your workforce for the next era: judgment, synthesis, and critical thinking are more relevant than ever.
Hybrid, resilient architectures are vital—carry forward lessons from cloud migration into enterprise AI adoption for security, copyright, and organizational control.

“The sweet spot is to be strategic: let the tactics run through automation ... adopt what’s appropriate, and you’ll make smarter decisions and elevate your business.”
— Dr. Darren Pulsipher (20:12)

For further resources and deeper dives, visit embracingdigital.org.

Loading summary

Transcript78 lines

[00:00]
A
Huge heterogeneous compute, which means we can have the same operating pool that are doing general ledger or back office processing and also doing inference or chatgpt like things in our own data source.
[00:16]
B
And then you could be benefiting from things that are small language models or vertical models, which are specific to your business industry, which are also adding efficiency.
[00:29]
C
Welcome to Embracing Digital Transformation where we explore how people process, policy and technology drive effective change. This is Dr. Darin, Chief Enterprise architect, educator, author, and most importantly, your host on this episode, Small Language Models and Enterprise, the Public Gen AI Killer. With returning guest Lynn Kampf from Intel Corporation.
[01:01]
A
Lynn, welcome back to the show.
[01:02]
B
Hi Darren. I'm glad to be back.
[01:04]
A
I would say what you superpower, we already know.
[01:07]
B
Oh yes, you do.
[01:08]
A
Yeah, we do. My opera singing vp. Right. I. I look back at all my episodes. You're the third opera singer I've had on my show.
[01:17]
B
No, he mentioned that.
[01:19]
A
It's crazy. So there's something about music in High Tap.
[01:23]
B
I hear you. I agree. If not.
[01:25]
A
But yeah, but we're not going to talk about opera today, even though that might be an interesting show one of these days. Let's talk about a big, huge trend that we're seeing in AI and the enterprise.
[01:36]
B
Yeah. It is such a big transformation and people are really struggling with how to view it. There's a lot of benchmarks in the industry that don't necessarily translate into what does that mean for my business and my business operations. So there's a lot that really get meat to talk about.
[01:56]
A
Yeah. So let's talk about the benchmarks first because this one confuses a lot of people. All right, Tokens. That's all we ever hear about your tokens per second or how much, how many tokens is that query or that prompt going to take? What in the world? How does that apply to enterprise?
[02:12]
B
You know, tokens are a Greek supply side metric because it's very similar to megahertz. How quickly can you get bits out? And so it's a very easy translation for hardware providers. I can do this number of tokens and very typical for tech world, you end up with more is better, tester is better. But we're not really translating what that means to humans.
[02:38]
A
Well, except I know right. When I'm on chatgpt myself that I run out of tokens. I know that.
[02:46]
B
That's right. That's right. Because the subscriptions limit the number of tokens. Right. And then it comes back, he says, oh, you can start again tomorrow.
[02:54]
A
Yeah, exactly, exactly. Because I'm limited. Right. Because it's so our. SO tokens are things that are consumed unlike gigahertz or megahertz. Right?
[03:02]
B
That's true.
[03:03]
A
So how does that play in enterprise? Or does it then have a play in enterprise at all?
[03:10]
B
A lot of it depends on whether the use case is something that is interacting with humans because humans really cannot made more Gant 20 tokens per second. It's essentially how quickly does the chatbot respond to you? D Slower means fewer tokens per second. Faster means more time.
[03:29]
A
Okay, that makes sense.
[03:30]
B
And so NUI is down to response times. Now here's the thing that's important. You have to understand what are the use cases where it is mission critical and I will either get paid more for faster response times or I will have a greater chance of liability for slower response times because there's. There's a return on acceleration. So let me give you an analogy. Ferraris are amazing on racetracks. They have the best output throughput race conditions. Ferraris that are driving through a school zone to get groceries, not very good. Think of human consumption as the school zone and going to the grocery store because we don't operate at the speed of light, the machines do. So depending on the weather. Your use cases something like real time fraud detection. That's really important to having millisenhits. And you either get paid for doing that upside or you get recognized in a liability for not doing that depending on your business. That's really important to think about. How many tokens can I possibly get? How quick they. Now if you're dealing with humans that are checking legal audits or expense audits where the system and the agency scan all of these agents and they're working together and the human has to look at the observations in that agent workflow. It could be 10 tokens per second. It could be 150 tokens per second. 150 will not get you any more than 10.
[05:04]
A
So yeah, but what's the downside of going with a higher token rate?
[05:08]
B
Well, it's basically a question of can you afford to ask also to teach. That's a higher cost. More tickets is higher cost. It can represent you when you're using a chatbot. More tokens can mean you get less consumption or you have to pay more if it needs a tsurushi. More tokens in an enterprise context generally means you're at a specialized firmware that is not part of your current tech stack that requires its own power supply. Sometimes liquid cooling ru liquid cooled or air cooled. Do you have to lots and yields restaurants. So it really comes down to don't do something nice. Look for the use cases that are manual annoying fire cross checking and if it's a human in a lift, then look for what can be. I think you and I were talking about green AI. He's what you've got. You have a whole fleet and enterprises. They generally are about 40% utilized just for caution on backup and echoes. There's a peak that's 60 view fleet you could probably use for some of these human use cases where speed is a life or death slur. Speed is not upside at all.
[06:23]
A
So this is really interesting because what you're saying is we can take advantage of our current enterprise architecture to get our rehab. We don't have got through our data centers. We don't have that huge heterogeneous compute which means we can have the same operating pool that are doing general ledger or back office processing and also doing imprints or chatgpt y things in our own data source.
[06:49]
B
And then you can be benefiting from things that are small language models or vertical models which are specific to your business industry, which are also adding efficiency. Because if your business is not needing to write haiku, input Shakespeare and create video images for fud, then you may not get value out of the biggest punches. So really understanding what is this going add to my business and I guess will I get more revenue or will I get more reduce my liability risk? Those are really, really important.
[07:38]
A
Okay, so where do we start with this? Because we've talked about publishing AI, Vulcan AI, all this Brock, all the ones who are out there fob you're talking different. You're talking private JD and so running my own language models on my own gaming center stuff, small language models can run on just CPUs. So where do I start? How do I start taking, where do I go to make that happen?
[08:06]
B
So I think the first thing would be to start experimenting with some basic use cases. The recommendations I tend to make are data processing, data synchronization because that takes advantage of something called Rex. Then we throw these germs around. But think of it as what did that document say? So you can make very quick judgments because AI doesn't do judgment. AI just summarizes.
[08:28]
A
InfiniteX summarize it in. I love how you said that because a lot of people think, oh, AI is going to tell me what to do.
[08:35]
B
Yeah, well, unfortunately.
[08:37]
A
But if you can give me information so I could make a replication and see. Okay, so there's a big difference between the Two.
[08:45]
B
Absolutely. And the thing that's real awkward with these rag use cases they need to be secure because it's your business data. And yai Brits don't say that's not integrated and bolted into or traditional data systems with their own data structures. It's not going to actually be AI benefiting our business because it won't work on your business data.
[09:09]
A
So this is interesting. Maybe this is why there's a new MIT report that just came out past Mod. It's a 95% of Gen AI business PoCs are failures.
[09:21]
B
Yeah.
[09:22]
A
Do you think that's because they did look at the business case that they were just.
[09:26]
B
Absolutely. And I think that a lot of it comes down to feeling like he has to do something with AI as opposed to intelligently thinking from the start what should those use cases be it have been benefiting us. There's a lot checked at in data quality data fencing. So there's work to be done there. And you know with the MIT study that you're referencing, most enterprises their IT are trained. If it's a mature stable tech stack and it's got disaster recovery, you leave that sucker alone. You're not just going to go add new technology because it's new and it's better. And so there's a cautionary ella we are deploying clients because it is going to be to data systems in traditional business word.
[10:22]
A
Well that totally makes sense because when I was a CIO I wanted to be admissible to the world. Right, because CIOs are not admissible to the board when there's problems.
[10:33]
B
Exactly.
[10:34]
A
And that makes sense. IT organizations are typically risk because I have to keep the lights on to AI projects are very disruptive.
[10:43]
B
Yeah. And the challenge is that if you view it like it's its own separate thing then you're duplicating your data systems to be able to use AI.
[10:53]
A
All I did was create another data.
[10:54]
B
All you say was create a data silo and more cost and more storage. And so really the best thing to do is prototype small and tight. Small and tight. Take advantage of the extra capacity you have and then really judge whether new hardware is necessary based on will I get D if I'm running faster or decreased my line will I decrease my line will at all because I ran faster. And if that app works that translates meaningfully to token cap. If that math doesn't work out then you're hearing a pitch from the tech Spire based on why the abortail kits at so different matter. But you're going rapidly with the cost.
[11:43]
A
With the duplication so and also new infrastructure and I don't necessarily know how to manage maybe I already have all my infrastructure totally optimized to run like a dream. I can sleep on the weekends. Stay tight.
[11:57]
B
Do you want to reopen your infrastructure do open heart surgery or do you want to really start small deliver capabilities to the customers of the IT party just in your business and then as demand peaks which IT usually tests then you've got your data you know tech debt to Tinkera easy and then you can justify I can support 10,000 users with just the CPU32 node cluster that I have I use the x360% in utilization for AI. Well now 20,000 need to use it so I'm going to add some smoshy sauce and so looking at it from that perspective of for this bit what do I get More revenue A lot.
[12:42]
A
More logical scene over the last three years in this crazy AI spin that's not no rob at all very very.
[12:52]
B
There'S revenue it's just more investment.
[12:56]
A
Okay so let's talk about individual use cases. You mentioned summarization where would you find those little nuggets? What crane do you use dresses.
[13:08]
B
You know one of the most useful things for summarization inside enterprises our own sellers at intact do me they have a tool called beta which is basically a database all the support calls and questions and answers when you run the phone with a customer whose lines down and I started out as an application teacher I lived this having something that very quick that summarize is somebody else seeing this what is the possible part.
[13:35]
A
Because before we have an email list the solution the solutions emailed us and you send out an email oh please someone help me with this and you would just wait.
[13:51]
B
Through the roof the opex pass when the oracle are just put in classic because the API calls can do them straight at the $30 call then that's on topic in Cloud Institute if you have a 5000 versal clusters an organization hitting that rag bully Bibli this is where small language hodls things that help keep things within bounds of your business Customer support is a very human one I've had somebody tell me they've seen a fall center of 20,000 people bring on for service there's some acceleration than vanadi you're not talking H200 black modes you're talking you know very inexpensive GPU accelerate but that's for service it's intelligent architect workflows that really.
[14:57]
A
Makes a big difference Call center is One another with IC is policy analyzer. I don't know what the right word is but like the policy chat flop. What's the. I've even used it myself with weird travel reaching things. I say what's the travel policy? Can I do this? And it, it says hey, here are the guidelines, here's how you could do what you want to do. CP so I'm seeing deeply super internal.
[15:25]
B
How does this work?
[15:25]
A
How does this work? HR policies doesn't be quite realm.
[15:29]
B
Yeah. And there's another grave one that I see. I've seen it used in insurance and I've seen it used in just business expense audit where somebody submits an expats report and instead of having to look at the receipts and go is this a legit case there? Basically the AI model is trained to look across in sphere. Is there anything here that's out of sorts who doesn't make sense? And they recommend we should look into this deeper or I don't find any. So it just helps the human not have to do as many actions.
[16:01]
A
I'm glad you kept the human on there because we had a little bit of talk earlier. The human still needs to help making the decision.
[16:08]
B
Yes, yes. Yeah. AI is not going to be good judge AI is going to help very quickly collect information represent in his nation so that you can use judge play simpler thinking which is resilient because you see a lot of AI chatbots and on wilds speaking some very dicey recommendations. And so it's not replacement or critical thinking. It's a tool to get information and format that you can think of critically fast.
[16:40]
A
Okay, so what that means is we have to retrain some of our workforce.
[16:44]
B
Yes.
[16:45]
A
Because a lot of our workforce and I've noticed this humans become the blue between processes and data and things like that. We put humans in the area where data is more unstructured or esoteric or things like that. Right. That part of our job is being replaced.
[17:07]
B
Yeah, the data is. The story is no longer valid, inverted. And it's interesting because if you look at trying to develop PowerPX, you've got most people are spending a day a week if not more on developing paths. So I think that's a good place to fund efficiency. Msul's path is really on data that's structured so that you can make recommendations and inclus humans. Otherwise we wouldn't be presenting. And so if you think about it, he's not respecting prompting to get power picking out of the rat where you got financial results or pastor players, you Got things like that all being put into the rat. He has to chat. But great power then it needs to look like it's busy. If you know what your narrative is then you're toast. It's going to be just like a typical PowerPoint that's by somebody who has a buyer. What is my unique training?
[18:10]
A
So I, I love that because what it means is we have to up our table series as I said exactly myself. I can't tell you how many PowerPoints I looked at and sent back and said to me I have no idea where you want. Right. So what that means is we as humans have to train ourselves more on critical thinking, art mitigations. These are, these are three core skills that we're going to have to learn to redo. I think we've been relying too much on I don't know what to use. I was so busy gathering all the.
[18:45]
B
Data I don't know what I'm supposed to show. And that's where you have gaps in being able to influence decisions. It sucks it yes. Because they're very impatient because they have very little time. It cannot be enough. Spark deeply and so if you don't tell them here's the decision that has to be made, here's the recommendations and here's the support is the up front and then you build the layers of the why that then even with frat what comes after that?
[19:17]
A
Great question. You're going to stolen a question. So this, this tells me we've got to do a better job. Upskills a hard hit though right. To become AI augmented. That's. That's a term I'm sorry to use for bar. AI augmented. Right. Not AI replace. But if you want to. If you really value your employees and you value your company then you want more intelligence at the table.
[19:44]
B
Yeah. I mean the humans are the ones with the judgment. And you know I joke in the roost fee a meat drink caffeine, do stupid stuff faster. And when you look at blind AI it's do stupid stuff. It works speech out of your eyes. And so you just need that human shanjebe attitude because that waffle is what's accelerated your judgment at the insights about what does this need strategically take action.
[20:13]
A
On top of what it tells me is we should behave it elevate our businesses by adopting this appropriate because we're going to make smarter decisions. We're going to be strategic and let the tag the. The tactics give through automation and things like that. So that, that to me is the sweet spot for ours to be successful. And I don't need to write PineQ. I don't need that general model to do. As far as.
[20:44]
B
No, you don't. You really don't. For most of my business allocations now, personal use, it's fine. But one of the things when you're using the chatbots without the enterprise context, you don't own the copyright to the output. And so you know, that's the other reason you just got ginky. Really Geek fillet. What do I get out on slofts vertical locks and those SLMs and vertical models are going to run in a hybrid environment. Will have cloud. It will have on prem and having an intelligent architecture is a cheap. That's a little critical. The rest of the general purpose stuff, it's great. It may use all sorts of things but if your business is your business really designed to use this one.
[21:30]
A
Well, it reminds me of CFT was hauling flowers there. We're seeing right where everything is. If I architect inappropriately, then I'm not affected by outages. I'm not affected by data breaches. It can handle everything. So we have to use that same architectural elements that we've been developing the last 20 years in AI as well. We can do hybrid AI. Hybrid gen AI.
[22:03]
B
See, it's interesting because you know it's very similar to the cloud, right? It used to be just to flat lift and shift and then everyone says wait a minute, this is way more expensive. Why is this expensive? And then they had to leave FIFA workbooks. I think it's the same principles.
[22:18]
A
So this has been great.
[22:20]
B
I talk. Thank you. I have to go.
[22:23]
A
Yeah, we should. You should come on the show more often.
[22:26]
B
Good luck.
[22:27]
A
All right. Hey, thanks again.
[22:28]
B
Thank you, Derek.
[22:34]
C
Thanks for listening to Embracing Digital Transformation. If you enjoyed today's conversation, give us five stars on your favorite podcasting app or on YouTube. It really helps others discover the show. If you want to go deeper, join our exclusive community@patreon.com embracing digital, where we share bonus content and you can always connect with other change makers like yourself. You can always find more resources at embracingdigital.
[23:00]
A
Org.
[23:00]
C
Until next time, keep embracing the digital transformation.