Summary7 min read

The 404 Media Podcast: "The Tokenpocalypse Is Here"

Date: July 1, 2026
Hosts: Joseph (“Joe”), Sam Cole, Emmanuel Mayberg
Producer: Alyssa Midcalf

Overview

This episode dives into "The Tokenpocalypse," a dramatic and abrupt shift in the AI industry surrounding AI usage costs, primarily focusing on “token spend” – the metric used to measure and bill for language model operations. The hosts break down how major companies are reeling from AI’s unexpected expenses and scrambling to put limits and controls on usage, after a period of unbridled enthusiasm and “token maxing.” They also explore the emergence of workplace memes (“token chewing,” “caveman plugin”), odd corporate policies, and the emergence of tools to throttle AI verbosity—all aiming to save millions in surging AI fees.

The episode is divided into two main sections:

The rise of token spend anxiety and consultant-driven “solutions”
Weird and funny ways companies are trying to reduce AI costs, like making chatbots speak in caveman talk

Key Discussions & Insights

1. What is the 'Tokenpocalypse'? (02:51)

Joseph explains the term “Tokenpocalypse,” detailing how companies once saw AI as a cost-saver but are now stunned by the real costs of “token spend.”
- Quote (Joe, 03:42):
  “There’s this big paradigm shift where, oh, this isn’t actually just automatically saving us money. … We now have to figure out, shit, how do we stop spending so much money on these tokens?”
Companies are moving away from unlimited or bundled usage towards tracking and capping every token (unit of AI output/input).
- Examples: GitHub switching to token-based billing, Uber blowing through its AI budget, Walmart halting employee use of certain tools.
- Quote (Joe, 12:14):
  “Uber… capped employees’ use of AI tools, like Claude Code and Cursor… the Uber CTO said the company had blown through its entire AI budget in just a matter of months.”

Timestamps:

[02:51] — What is Tokenpocalypse? Defining token spend
[04:24] — The narrative shift from “token maxing” to “token panic”
[12:08] — Examples: GitHub, Uber, Walmart

2. The Role of Accenture: Consultant-as-Problem Solver (05:19)

Accenture consults for clients (and uses internally) on optimizing token spend. They are both part of the original push to adopt AI, and now pitch themselves as experts to solve the ensuing bill shock.
Internal audio obtained by Joe reveals that surprisingly, “non-engineers” (not coders), are driving much of the AI spend—often for very basic tasks (e.g., converting PDFs to slides).
- Quote — Stuart Henderson (Accenture Exec, recalled by Joe, 08:01):
  “I’m learning that’s one of the big token chewers, turning PDFs into markdown. Is that right?”
- Quote — Justice Kwok (Agentic AI Strategy Lead, Accenture, 08:11):
  “Yeah, that’s actually the behavior that we’ve been seeing.”
Accenture is developing a product called “Token IQ” to manage and throttle token usage for internal staff and clients.

Timestamps:

[05:19] — What does Accenture do here?
[07:48] — Who is really burning tokens, and for what?
[15:59] — Accenture’s ‘Token IQ’ product to control spend

3. The Employee Experience: Mixed Signals and Corporate Contradictions (13:45)

Emmanuel captures employee confusion:
- Quote (14:06):
  “Employees are getting very mixed messages. On the one hand: ‘AI is going to change everything! ... use it all the time!’ And now: ‘please don’t bankrupt the company because we’re spending way too much on it.’”
Examples from consulting and corporate tech:
- Financial Times report that Accenture began forcing employees to use AI or risk missing promotions.
- The new trend: after a “spend as much as you want” vibe, employees are suddenly restricted and sometimes penalized.

Timestamps:

[13:45] — Employee frustrations and analogies (“pivot to video”)
[14:46] — Comparison to previous tech hype cycles

4. From Corporate Waste to Caveman Plugins: The “Caveman” Solution (24:03)

Some companies are using technical hacks—like the “Caveman plugin”—to slash AI verbosity, reducing unnecessary dialogue and token usage.
- This plugin rewrites AI outputs in much more terse, “caveman” language, avoiding flowery explanations and cutting costs.
Joe tried the plugin:
- Quote (Joe, 28:47):
  “If you’re asking it to do code, it just like, here’s the code. ... It just replied with something like ‘users API, no scraping’. I’m like, damn, that’s true. Right on point.”
Developers at OpenAI, Nvidia, GitHub, and others are using this plugin (sometimes even contributing code), not officially company-sanctioned but picked up by individual engineers to reduce spend.

Timestamps:

[24:24] — Genesis of Caveman plugin
[26:41] — How it changes responses and Joe’s experiments
[32:30] — Who’s actually using it
[34:03] — Token savings (estimates: up to ~65% less token usage)

5. Other Corporate Tactics: Bob Coins and Model Lockouts (36:50)

Emmanuel brings in examples from IBM and Citi:
- IBM’s internal LLM agent “BOB” uses “BOB coins”—an internal currency employees must spend to access the tool. Now, employees run out and joke about being out of bobcoins.
  - Quote (Emmanuel, 38:14):
    “I saw a screenshot on Slack… a bunch of IBM employees complaining they’re fresh out of BOB coins… that is just like such a funny, dystopian snapshot… what the push and pull from AI tools is doing.”
- Citi encourages staff to use less powerful (cheaper) models for mundane tasks, warns of excessive spend, and in some cases, according to screengrabs, allegedly restricts access to advanced models entirely.

Timestamps:

[36:50] — IBM’s “BOB coins” and employee experiences
[38:35] — Citi’s emails pleading for less powerful AI usage

6. Philosophical & Cultural Takeaways

The host frustration with “cute” or overly verbose AI/UX design:
- Quote (Emmanuel, 31:21):
  “The most annoying thing in the world is when software is like cute and tries to talk to you… Like Microsoft made this change… it’s like ‘shut the fuck up. What’s the error code?’”
- Sam, 31:10:
  “I think it should have been command line the whole way. We should have never strayed from that.”
The growing realization among companies: These chatty, over-friendly AI agents were never actually designed with cost in mind, and new controls/boundaries are now necessary.

Notable Quotes

On Accenture’s Role:
- “It’s funny how they position themselves as the problem in the first place and the cure, where they’ve told companies, you need to get on AI… and then it’s like, oh, wait, you’re spending a bunch of money on tokens. Maybe we can help with that.” — Joe (06:51)
On AI’s Ubiquity, Then Panic:
- “Companies are like, ‘AI is going to change everything!’… And now: ‘please, don’t bankrupt the company because we’re spending way too much on it.’” — Emmanuel (14:06)
On Verbosity and Caveman Plugin:
- “It just goes Hulk smash basically. So if you’re asking it to do code, it just like, here’s the code. It doesn’t give you this over the top explanation.” — Joe (28:47)
On Dystopian Workplace Currency:
- “I saw a screenshot on Slack… a bunch of IBM employees complaining they’re fresh out of BOB coins… that is just like such a funny, dystopian snapshot.” — Emmanuel (38:14)

Episode Timeline

00:00–02:51 — Opening banter, defining Tokenpocalypse, token spend
05:19–12:08 — Role of Accenture, internal consultant audio leaks, “token chewing”
13:45–15:59 — Employee confusion, AI usage mandates, “pivot to video” analogy
24:03–29:16 — The Caveman plugin: origins, effects, Joe’s test
32:30–34:03 — Caveman plugin’s adoption at major tech companies, token savings data
36:50–38:35 — Other company tactics: IBM’s BOB coins, Citi’s restrictions
Wrap: Philosophical reflections, subscriber promo (end of content section)

Overall Tone

Dryly funny, skeptical, and a bit grumpy about corporate tech trends and the industry’s willingness to repeat mistakes. There’s a note of exasperation with corporate jargon and a running joke about “cute” software and the pain of the modern workplace.

For Listeners

If you want to understand the reality behind the “AI at work” hype—a mix of excitement, top-down mandates, chaotic spending, and now, bizarre attempts at cost control—this is an excellent snapshot from journalists deep in the weeds. The episode illustrates the pendulum swing from “maximize AI everywhere” to “whoa, too much, dial it back,” with plenty of memorable moments (and memes) along the way.

Key takeaways:

AI isn’t free—token-based billing has changed the corporate landscape.
Consultants were first to push AI, now they’re profiting from reigning it in.
Most token spend isn’t genius “10x engineers,” it’s ordinary tasks (slide decks!).
Companies are imposing everything from weird metering (“BOB coins”) to plugins that force AI to be terse (“Caveman”).
Employees are caught in the crossfire, given contradictory incentives and tools.
There’s a growing backlash against excessively verbose, “cute” software.

For blog links and subscriber content, visit 404media.co.

Loading summary

Transcript55 lines

[00:00]
Emmanuel Mayberg
The most annoying thing in the world is when software is like, cute. Microsoft made this change years ago where, like, if you have a crash, you're like, oh, something bad happened. It's like, shut the up. What's the error code? What's the error code? Because I'm going to have to like, plug it into Google.
[00:22]
Joe
Hello and welcome to the 404 Media podcast where we bring you unparalleled access to hidden worlds, both online and IRL. 404 Media is a journalist founded company and needs your support. To subscribe, go to 404 Media Co as well as bonus content every single week. Subscribers also get access to additional episodes where we respond to the best comments. Gain access to that content at 404 MediaCo. Also remember to subscribe to our YouTube channel where you can watch all of our episodes. Subscribe@YouTube.com 404Media co. I'm your host, Joseph, and with me are two of the 404Media co founders. The first being Sam Cole.
[01:04]
Sam Cole
Hello.
[01:05]
Joe
And the other being Emmanuel Mayberg.
[01:07]
Emmanuel Mayberg
Hello.
[01:08]
Joe
So Jason's not here. He's probably on a private jet somewhere. If you have read the website at the time of recording, you'll know what that's referring to. I won't spoil it. Go read that really, really good article if you want, and you should, but otherwise I'm sure you all will speak about it next week on the pod. I don't think I'm here next week, but Jason will fill us in there for sure. Sam, do you want to take us through this one?
[01:38]
Sam Cole
Yeah. So this first one, we have a set of stories that kind of go together. So the first one is by Joe the Token Pocalypse. Tokenopolypse.
[01:53]
Joe
I even considered. Well, this was the hardest bit is the tokenpocalypse.
[01:57]
Sam Cole
Tokenpocalypse. Tokenpocalypse is here. Companies are scrambling to stop spending so much on AI. I feel like we have coined a few token related words. Did we coin token maxing or was that an existing term?
[02:13]
Joe
Emmanuel, what do you think?
[02:15]
Emmanuel Mayberg
I believe the brain geniuses at the AI companies actually came up with that.
[02:20]
Sam Cole
Fine.
[02:20]
Joe
All right, so I'm not on this one. I thought I coined it. I thought I coined tokenpocalypse. And then I think I googled it after the fact. I found TechCrunch had used it in a headline like a few days before. Fuck. So it's in the air. Yeah.
[02:39]
Sam Cole
Yeah. So, yeah. So we're introducing some new words here. We're token maxing, we're token apocalypsing. So do you want to just define what we're talking about when we're talking about the token apocalypse?
[02:51]
Joe
Yeah. So in my definition, after coining this term, days after TechCrunch and presumably some other people, the reason I kind of use this term is that something has really, really shifted in the AI industry, where we've had companies rushing to adopt AI in all sorts of forms. Maybe that's in coding, maybe that's agentic AI in their businesses as well and whatever. And companies have sort of been doing this, maybe not regardless of the cost, but it's been presented to them as cheap, right? Oh, well, this is cheaper than people. You do your subscription or whatever it is to an AI company every month, every year or whatever. Maybe if you're enterprise, you get a big deal. But there's this shift happening where now there's much more emphasis on the cost of individual tokens, which, you know, is basically user usage metrics. Right. So now you'll have companies like GitHub being like, we're actually going to charge you per token. And all of a sudden companies are realizing, oh, AI actually costs us a lot of money, and we're burning through these tokens, which we'll talk about in a bit. So that's why I used that term, and I think that's why other people are using it as well. There's this big paradigm shift where, oh, this isn't actually just automatically saving us money. We now have to figure out, shit, how do we stop spending so much money on these tokens? Basically, yeah.
[04:25]
Sam Cole
Which is. I mean, it's rich. Because of the last few months. Jason wrote a story that. And we'll get into just the ways that this has been working in the last few months. But Jason wrote a story a couple months ago about token maxing and how that was the thing in March and April to be bragging about how much token spend you have or these leaderboards that we can get into in a bit. But before we do that, this story that you wrote involves audio that you obtained from a consulting company called Accenture. And I think if you're in tech, you know what Accenture is. But I, Even when I was editing this story, I was like, what are we even? Are we talking about Accenture's token spend? Or Accenture is consulting on token spend? So can you just walk us through, like, how does Accenture come into play in this particular story? Like, what are they? What's their place in the token mania?
[05:19]
Joe
I guess Yeah, I mean, who knows what Accenture does?
[05:24]
Sam Cole
Yeah, it's one of those.
[05:26]
Joe
That's an open question. Yeah. Obviously they're a consulting giant and they've done stuff like work on content moderation for Facebook.
[05:36]
Emmanuel Mayberg
Right.
[05:37]
Joe
Like a significant amount was outsourced to Accenture back in the day. They will fulfill this role where they come into companies and be like, hey, we can help you optimize this task, or we can cut off expenditure on this resource or whatever. Or whatever it is. Right. And in this particular case, they are talking. We're talking about two things. They are talking about token spend inside Accenture itself, but also with their clients. Right. They clearly have visibility into what their clients are doing and what their clients are saying in this audio. It doesn't mention any specific clients. We can't play you the original audio for source protection reasons. But they do say, like, well, we're seeing this sort of thing inside and outside with clients as well. But Accenture, I mean, we'll talk more about this in a minute. But it's funny how they position themselves sort of as the problem in the first place and the cure, where they've told companies, you need to get on AI, you need to do this as quickly as possible. And then it's like, oh, wait, all of these companies are spending a bunch of money on tokens and maybe we can help with that.
[06:52]
Sam Cole
Yeah, weird how that works. Weird how consultants keep getting away with it. So, yeah, like you said, this is based on audio that you had obtained. And it was. The audio is from an internal meeting. And so obviously they're discussing things that they don't necessarily want the wider public to hear or are not ready to discuss in public. But there's one part that's pretty key to the conversation, and it's coming from Justice Kwok, who is the agentic AI strategy lead at Accenture. That's a hell of a title. But in the audio, justice says we're seeing from some of the data, internally at least. It's actually not our engineers that are driving the token consumption. It's a lot of the non engineers that are doing some of those behaviors that we're talking about in the meeting. So what are the behaviors? What are the non engineers that justice is talking about actually spending tokens on?
[07:49]
Joe
Yeah, it's a funny piece of audio because it actually starts with a joke and it's kind of hard to describe that in an article. It starts with race, sarcastic comments. So justice, who you just introduced, is preparing to present in this meeting, and then someone interrupts Stuart Henderson, who's a very senior person in Accenture, and he jokes that, oh, you have these slides, but I hope you didn't use AI to just convert a PDF into markdown and all of that sort of thing. And then Stuart says, quote, I'm learning that's one of the big token chewers turning PDFs into markdown. Is that right? Which obviously, as I said, is a joke, but then leads to a pretty serious question. And then that's when justice replies with, yeah, that's actually the behavior that we've been seeing. So I found this fascinating for a few different reasons. The first, obviously I love that term token chewing. I mean, that is a consultant ass term if I've ever heard one. Maybe Emmanuel or Jason have heard it when they've looked into token maxing or whatever, but I never heard that before. So I like that. But, but just the idea that it really undercuts that narrative that, oh, this explosive use of AI is all about supercharged 10x engineers or whatever using AI to produce mountains of code and then we can ship products faster or whatever. Here it seems to be non technical employees do non technical tasks like converting a PDF into a PowerPoint. And it's kind of like the he emitted meme that actually it just isn't all that sophisticated. And to be clear, this isn't to say that AI is not being used in Claude code or Codex or whatever to make a bunch of code. Obviously it is as well, but we've entered this more mature stage of AI being in companies where the reality is setting in. It's like, oh, people are just using for a bunch of dumb fucking shit, basically. And hey, it might be a time saver, you know, converting PDF into a PowerPoint might save you 30 minutes, if not more. Depends how much of a perfectionist you are of your slides. But there's going to be a cost to that down the road that I think people are only just starting to realize. Yeah. And then, I mean, justice goes on and says more broadly that they're really seeing what they call a rapid escalation in AI token spend, I think they say soaring token spend as well. Maybe stuff isn't dire, but it's to the point where obviously Accenture sees an opportunity here and they're having internal meetings and then we're hearing about it as well.
[10:35]
Sam Cole
Yeah, it's just lazy. It's just the laziness of it all is wild. I mean, you can get AI to summarize your PDF for you, read it for you now turn it into slides. It's like, what is your job actually,
[10:50]
Joe
especially for Accenture, who. A lot of people made this joke in the comments, like Accenture's whole thing, if you're being a bit mean to them or whatever, is making slides. Basically a bunch of consultants just make slides and now apparently they're not even doing that.
[11:03]
Sam Cole
Yeah, amazing, incredibly lucrative career. I guess so, yeah. Not to over explain the joke of it all, but it's just so we see this over and over. It's like the AI is kind of hollowing out our ability to do anything yourself. It's like, can we not make a grocery list for ourselves anymore? Can't read a PDF anymore. Can't read anything anymore. Have to have it summarized, have to have it turn into slides. You slap those slides over to somebody else. Maybe they're using AI to read the slides, recording the meeting, summarizing for you later, I don't know. So, yeah, it's trying to put some value into an already fake job, I guess. So this, like we hinted at earlier, is part of a much wider trend that we're seeing around AI. A couple months ago we had, you know, token maxing. We had bragging about token use and spend, and now we're seeing companies scaling back. So just quickly, what are some of the companies that are actually talking about this outside of Accenture and talking about, you know, charging customers or scaling back employee using AI?
[12:08]
Joe
I mean, the first big change, which I think I mentioned was GitHub moving to a token model. And that happened, I think at the start of June, and we're recording this at the end of June, and that's when we've seen, I think, a bunch of media coverage and also for us, a bunch of leaks from various companies where, oh, wow, that took no time at all to start impacting people. Where the change comes in June 1st and then halfway through the month, and then towards the end of the month, we have all of these leaks, including some of which will be in an article that we'll publish after this podcast is actually out. But sort of the biggest one, I think for me is probably Uber, where they capped employees use of AI tools like Claude Code and Cursor as well. That came after the Uber CTO said the company had blown through its entire AI budget in just a matter of months. Walmart as well, I think stopped people using tools recently. So a bunch of massive, massive companies are either trying to find a way to save money and we'll touch on that in the next story, or they're just like pulling the brakes on it entirely on the tokimax and stuff. I think Emmanuel might be better at that. What do you think, Emmanuel, about how this sits in with that whole idea of everybody wants to use AI as much as possible.
[13:46]
Emmanuel Mayberg
I think we'll probably get into this a bit more definitely in the story that we're about to publish. But when we talk about the story in the next segment, I think it's just employees are like in a very contradictory. They're getting very mixed messages. On the one hand, it's like AI is going to change everything. You have to use it all the time. It's going to make you 100 times more efficient. And that is now overlapping with but please don't bankrupt the company because we're spending way too much money on it. And I'm trying to think about if I've ever experienced myself such a corporate 180 and I'm guessing it may be it's like if you imagine like one of those pivots to video where they're like, we need everything to be video. It's video all the time. And then you wake up one morning and they're like, we don't have the money to shoot anything and get cruise. But even that is more simple than this. They're just like getting like completely conflicting messages. I don't know how you're supposed to navigate that, to be honest.
[14:47]
Sam Cole
Yeah, that's a good analogy, I think. Or it's the closest I can think of is we're shifting strategy this way. Oh wait, we can't actually fund that strategy, but we still expect you to perform and do the work with these tools that you don't have, which is just lovely. And just one last thing on this story because I think it is so good. A couple months ago, the Financial Times reported that Accenture started forcing people to use AI or risk missing out on promotions. So it's not just like we really want you to use this, it's if you don't use it, you will not grow in this company. You will not ever get a better paycheck, get a better role. You'll be stuck where you're at. Or I assume probably the implication is we're going to let you go because you don't fit in here anymore. And at the time, a spokesperson told cnbc, our strategy is to be the reinvention partner of choice for our clients, to be the most client focused, AI enabled, great place to work. And then later in the audio justice Kwok says accenture plans to formally launch a product called Token iq, which I'm sure is like some kind of throttling mechanism for the token use for employees, but we don't know yet, right?
[16:00]
Joe
Yeah, yeah. Because, I mean, there's a bunch of other stuff in the audio that I didn't really get into, because it does just, like, get probably a little bit too in the weeds and boring for listeners. But basically what they do go on to say is there's all of these different ways that we might be able to curb token spend, and that's like putting budgets in place, user controls and that sort of thing. And I don't think it's as serious as, obviously, the content of chatbots, where we have this big problem where there aren't guardrails and fucking chatbots are telling people to. To kill themselves and all of that sort of thing. Obviously, it is not as bad as that, but there's an interesting parallel where chatbots didn't have the guardrails for that, and it did all sorts of horrible, potentially unexpected behavior here. AI tools have rolled out, and there haven't been guardrails in place to stop companies spending millions of dollars or something on LLMs or burning through their tokens or whatever. So there's an interesting constant. There's where these tools are coming out, and there might be a chatbot, it might be a coding tool, but you don't have these guardrails in place. And now Accenture can position itself after telling its clients to adopt AI as quickly as possible. Well, we can help you deal with what justice calls token economics. I think at the start of the audio, they're introduced as token ops as well. So this is a whole thing now where they're making sort of umbrella terms to talk about just this part of the AI industry. Like, there are people now dedicated just to figuring out how do we not spend so much money on tokens and that sort of thing. Justice actually makes the point that the guardrails weren't there. That's just not me making this point. He's making that as well. And now we're in the position to be able to do that ourselves, which is incredibly convenient for consulting giant, obviously.
[18:04]
Sam Cole
Interesting how it works. Okay, let's take a break, and then we'll come back and talk about a very related story. But I don't know, a little bit even funnier, even more absurd. So we'll be back.
[18:28]
Sponsor/Ad Reader
This message is sponsored by Raycon. I've noticed that before I leave the house, there's a few things I always check for phone, keys, wallet, but also lately my Raycon Everyday Earbuds Classic Raycon Everyday Earbuds Classic have become part of my everyday carry and they're something that I don't leave the house without. Whether I'm grabbing coffee, taking a walk, running errands or sitting outside getting some work done, I almost always have them with me. What I like about Raycon's Everyday Earbuds Classic is the active noise cancellation. Well, that's one of my favorite features because when I want to focus on a podcast or music I can just tune everything else out. But if I'm walking around town or riding my bike, I just switch to awareness mode so I can still hear what's going on around me. It makes me feel a lot safer in a very hectic LA environment. They're also packed with features like multi point connectivity so they stay connected to both my phone and computer. Plus they have up to 32 hours of battery life with the case. And if I ever forget to charge them, the quick Charge feature gives me about 90 minutes of listening from just a 10 minute charge. I also really like that they come in a bunch of different colors. I like these black ones. It makes them feel different from every other pair of earbuds that you walk around with and see out in the world. And honestly, it's very hard to argue with the value. Raycon gives you premium audio quality at about half the cost of the big brands, plus they have over 3 million happy customers who can't be wrong and a 30 day happiness guarantee. The Everyday Earbuds Classic are a great option for everyday listening. Go to buyraycon.com 404 to get 20% off, that's buyraycon.com 404 for 20% off. Thanks to Raycon for sponsoring what's softer than cashmere and warmer than wool. It's not a riddle, it's an alpaca hoodie and I had to check it out after hearing some of my favorite podcasters talk about Pakka. Pakka hoodies are great for fall and winter and honestly pretty good for chilly summer nights or at the beach late at night when it gets a little bit breezy. But I also love their T shirts and socks which have been my go to during this very hot summer. I picked up Pakka T shirts in several different colors and I also think their socks look stylish and feel great whether you're hiking, traveling or or just bopping around town. Pakka makes outdoor and lifestyle apparel from Alpaca Fiber, one of the world's most sustainable natural fibers. Their clothes are softer than cashmere, warmer than wool, and most importantly, they're really breathable. Their hoodies are built for real life. They're thermal regulating, odor resistant, durable, and made to last. Over 250,000 people have already picked up Pakka Apparel. If you've been thinking about leveling up your game, this is your sign to do it now. Again, they're not just hoodies. They make amazing T shirts. They make boxers. I have some of those. And they also make amazing socks that I wear all the time. To grab your Paca hoodie or other Apparel, go to www.packaapparel.com. that's www.pakaapparel.com. if you've listened to this show for more than a week, you know our beat is uncovering how your data is quietly harvested and exposed online. And if you're like us, working remotely, constantly on the move, and running your entire life off your laptop and phone, well, you're leaving a massive digital footprint. Every time you jump between networks, check emails on the go, or browse from a new location, your device is constantly broadcasting your activity. Advertisers can track your activity, leading to targeted ads, price discrimination and a sense of being monitored. It's the kind of systemic tracking we'd normally write an article about. That's why we recommend Surfshark. At its core, Surfshark VPN keeps your online activity private. By hiding your IP address, it encrypts your connection, making your online activity much harder to track. So your research, your sources and your personal data stay private no matter where you're working from. You might have noticed I'm traveling right now and I'm using Surfshark. I've been using it for months at this point and I really like how easy it is. But Surfshark does far more than what I've just told you about. It blocks ads and trackers, which cuts down on the price discrimination games sites play based on what they know about you. It also includes Alert, which monitors your ID for data leaks, and an email scam checker that protects against phishing attacks. And if you game, it helps defend against DDoS attacks and ISP throttling since you're always on the move. The best part is that one subscription secures unlimited devices so your entire household is covered. Go to surfshark.com 404 Media to get four extra months of Surfshark VPN with the reassurance of a 30 day money back guarantee. Or just use code 404 Media at checkout. That's surfshark.com 404 Media.
[23:22]
Damon Fairless
If you sold somebody a loaded gun who you knew was in a vulnerable state and they shot themselves, I think it is murder. Just because you're using the Internet, it doesn't mean you get away with murder. I'm Damon Fairless, host of Hunting Warhead. This season I take you inside the business of suicide and the places desperate people go when they can't find what they need in the real world. Hunting the Suicide Salesman. Available now, wherever you get your podcasts.
[24:03]
Sam Cole
So we're back with another story about AI cost, AI spend at companies and how they're dealing with it. Headline is companies are making Claude and Codex talk like cavemen to stop AI's soaring costs. Joe, this is another story that you wrote. Just walk us through how this, how you came across this, I guess, to begin with.
[24:24]
Sponsor/Ad Reader
Yeah.
[24:25]
Joe
So Emmanuel and I are working on another story which probably won't be in the show notes here, but we're going to publish it just around the same time. This goes out to all of our free listeners. If you're a paid subscriber, you're getting this beforehand, please don't listen to it and then go and scoop us. That would be really annoying, but we'll talk about that a little bit. So we're working on that story about the various ways that companies are actually trying to stop this. So whereas the essentia audio was like, there's this issue and companies are spending too much on AI and we need to help them stop token spend, we looked at the ways that companies are actually doing that and we'll talk about those in a bit. But a really, really interesting one I got was a leaked memo from a company that does sort of digital infrastructure and actually works on data centers now as well. In this memo, they were talking about the various ways to curb token use. And one was use the quote, unquote caveman plugin. And I never heard of this. And it was highlighted to me and I went to look around and yeah, it's there on, on GitHub that that's how I first come across it.
[25:42]
Sam Cole
Yeah, gotcha. So, yeah, you start looking into this. What is, what does it actually do? We should have done this whole podcast in caveman's week, by the way. I don't think I could do it.
[25:53]
Joe
But I was already planning to do that for my behind the Blog on Friday, but I did something equivalent last week. I think I kind of need to give people a reel behind the Blog.
[26:02]
Sam Cole
Yeah, this is becoming too experimental because
[26:05]
Joe
now I'm just phoning in with like weird bits every week. We should have done that. Why would. No, I was going to do caveman voice, but I'm not going to do it.
[26:14]
Sam Cole
It's like a meme on like TikTok and Instagram. You've seen this. It's like explain things in caveman, have you? As a manual. Emmanuel's more online than no. It's like, it's like your physical therapist explains to you. It's like, yes, protein, no, electrolytes. It's like electrolytes good. It's like, it's a whole thing. It makes me feel like totally brain rotted. Anyway, so yeah, tell us, tell us what it does. And also you tried it, so tell us what your experience was with it.
[26:41]
Joe
Yeah, I mean it really does that. It changes the output of Claude and Codex and I think you can do Gemini and basically any sort of AI tool at this point and it changes the output to not be the normal verbose, flowery, over the top responses that you'll get from these LLMs and make it really, really to the point where it is avoiding completely unnecessary grammar to communicate its point. And I mean there are examples in the piece, but I think it was you, Sam, that sort of gave me this idea when I was writing it. But yeah, it's less the sort of voice of the AI chatbot which is like, oh, you were right to push back. I was wrong. I'm so sorry. Here's a big explanation. It just goes Hulk smash basically. So if you're asking it to do code, it just like, here's the code. It doesn't give you this over the top explanation. If you're asking a question, it will just shoot out a response which it believes is still accurate. And I guess in my tests I would say it's still accurate. I haven't done it a whole bunch but like the tool does work as advertised. I downloaded it, I integrated it with Claude code, I asked it to look at some code I had previously worked on. It was actually a way to scrape contracts from DHS procurement databases and that sort of thing. And I wanted to check like if I ask it to look at this code that was previously written, is it going to give me like a really in depth analysis or just tell me how it works. And it just replied with something like users API, no scraping. I'm like, damn, that's true. That is right on point. Didn't need to to provide any more information there. So it does what it Says, which is that it doesn't change your input. Obviously, you can still be as verbose as you want, but it changes it, as the creator told me, just into a terse tool, like, you're not having a fun conversation with this thing. Which, frankly, if we're going to have to live with AI chatbots for a bit longer, or if I worked in a company where it was forced upon me like Accenture or whatever, I personally would much prefer engaging with it. Like, it's a tool, not a conversation. They have to have this fucking computer program every single day. So it seems like it does the job.
[29:17]
Sam Cole
Yeah, yeah, I've tried. So I'm not on a mission to make ChatGPT useful for me. Certainly most of the time when I'm like, oh, maybe it could do this, it doesn't do it well. So I was trying a while back to make it because I was like, maybe there is something in. If it only spoke like a computer, if it was actually just returning tasks instead of giving me all this, it's like, shut up, stop talking. So I basically told it, I was like, shut up, stop talking to me. You're a person. You are a computer. Speak in computer syntax. Only speak in the necessary words and return the tasks. And it's really hard to make it do that. So I thought it was interesting that they created a plugin for this instead of. Instead of just prompt injecting and like making it like, you are a caveman. Because I bet it would try to do like a ca. A fucking character. Like, it would do like, you know, uguaga. Like, yeah, yeah, it would work with
[30:15]
Joe
a club and shit, like banging itself.
[30:17]
Sam Cole
It would do a whole storyline. It'd be like, I'm the greatest caveman's ever existed. It's like, stop. Shut up.
[30:22]
Emmanuel Mayberg
Stop speaking tangent. But I agree with you that the most annoying thing in the world is when software is like cute and tries to talk to you cute. Like Microsoft made this change years ago where, like, if you have a crash or like, oh, something bad happened, it's like, shut the fuck up. What's the error code? What's the error code? Because I'm going to have to like plug it into Google. It's so annoying. And I also. I've just had an experience that was like, I forget what I was calling, but it was like an AI whatever interview thing. And it takes so much longer just because of the pageantry of trying to pretend like you're a person. And it's just like a person would actually be Way more efficient than this. So just like, cut to the chase.
[31:08]
Sam Cole
It won't do it.
[31:09]
Emmanuel Mayberg
So I hate that stuff.
[31:10]
Sam Cole
I hate it. I do think it's like, this is. I start to feel really old and grumpy. But we have strayed so far when we started doing UI and graphical interfaces that spoke to you like a person, even just like the copywriting on the error messages. Like you said, the original sin, I think it should have been command line the whole way. We should have never strayed from that. But who could have predicted that this is where we would be today is like trying to code using AI and making it talk like a caveman.
[31:43]
Joe
I think just briefly on the point you made, Sam, that, you know, you could just do a prompt, right? As you say, be like, hey, be a caveman. But there could be all sorts of side effects on that. I can't find the exact quote in front of me right now because the GitHub page is a little bit long. But the creator does mention there somewhere, like this is more effective than a prompt or something. Which I think exactly goes to what you're talking about. Like, they've had to develop a skill, I think it's called a skill, right, in Claude code parlance. But they've had to make something that goes a little bit further than. Could you just shut the fuck up, please? And it apparently works.
[32:19]
Sam Cole
Yeah, yeah, that makes sense. So we have some idea of who might be using this thing. Who is actually using Caveman, Claude.
[32:31]
Joe
Yeah, so the memo I got was from a company called LeGrand, which, I'll be honest, I had not heard of before. I think when Jason edited this piece, he was like, I've not heard of this company. You need to define it. And yeah, as I said, they're electrical and digital infrastructure. And I found it quite funny that they have moved into the data center business and that sort of thing. And they were the ones that have this internal memo that said, hey, maybe you should use Caveman. When I spoke to Caveman's creator as well, he mentioned that he's heard it's been used by individual developers in Nvidia, OpenAI and GitHub, and then some others as well. I'll just stress that doesn't mean there's like a company wide deployment of Caveman at OpenAI or an Nvidia or something like that. But that's pretty interesting. Ordinarily it would be like, well, the creator's saying that, and maybe it's in his interest to just say that big companies are using or People at big companies are using this tool. But what was really, really interesting is that he flagged to me when you go through sort of the commit history of the tool. Shane Sweeney, who is the director of engineering at OpenAI, actually contributed code to this really, really early on when it launched and added functionality for Caveman to work with codex, which is OpenAI's coding agent. So at a minimum, Shane Sweeney is made has written code to work along with this tool as well. So I felt much more confident including the creator's assertion that yeah, people inside big companies are using this. None of those companies have got back to me GitHub, OpenAI or Nvidia. None of them got back to me by absolutely would not be surprised if more and more companies are using this or more individual developers are using this. Because something I should have said earlier, but it does work not just in the making the output more basic, but apparently it works with reducing token spend as well. So you can do a command in there which is like caveman stats or something like that and they'll come out with this is how many tokens we think you've saved. So it estimated without Caveman it was like 8, 8k or something and then the token saved was like 5, 8k, around 65% saved. I would have to look a little bit more into the methodology to figure out, well, how exactly are you determining that? But putting those together, yeah, this could be really, really weirdly valuable to companies. Like I'm burning through my token allocation or my budget or whatever. And it does make you think like maybe we should have just made the software be a fucking piece of software in the first place rather than an AI companion. Obviously I'm being facetious, but exactly what Emmanuel was saying, all the cutesiness of it. Now you have to install an independent tool and someone has to make an independent tool to actually get these sorts of code agents to where they want to be. And speaking of agents, it's not just like integrations with claw code or codecs or whatever. I didn't try this, but you can download a version for openclaw, which is the agentic AI, right? And you can even download an entire agent which does everything in Caveman. So not just the outputs, but apparently a more all encompassing approach to Caveman as well. I haven't tried that one, but I imagine that one works as well.
[36:26]
Sam Cole
Yeah. So you teased at the top. Emmanuel has a story coming. You and Emmanuel both have a story coming about other companies that are dealing with this entire problem that we've been talking about for the last 30 minutes. Emmanuel, what else do we have coming down the line? Like you saw some stuff with Citi, maybe IBM, like what's, what are we looking at coming up?
[36:50]
Emmanuel Mayberg
So I think we want to save kind of our best nuggets of information here for the article which will be published tomorrow. But there are a few things that I think are not going to make it into the story that I think are pretty funny and are a good example of what people are dealing with. That is just like examples of how companies are trying to limit the token spend, which is really just the amount of money that we're spending that these companies are spending on AI tools. The long and short of it is that access used to be unlimited and now there is a limit. This I have experienced where it's like, oh, we used to have unlimited access to Getty and you can use all the Getty images you want and then it's like you need to sign four forms and you only only get like one image a week. And that's kind of like what is happening at these companies. And one of those companies is IBM which has its own internal coding agent called bob. And in order to use BOB you need to use BOB coins. And it's I saw a screenshot on Slack and it's a bunch of IBM employees complaining that they're fresh out of BOB coins. And it's like oh no, my popcorns, I'm out of BOB coins. How am I going to finish this project? And I thought that is just like such a funny dystopian snapshot of like what the push and then the pull from AI tools is doing to the average employee. I think Joe, there was like a. Did you want to talk about the Citibank?
[38:35]
Joe
Yeah, I'll talk about the Citi one. So Citi, the banking conglomerate though sure many people will know is that I've seen emails where Citi says essentially, whoa, whoa, slow down, please stop using so much AI. Obviously I'm paraphrasing, you'll be able to see the actual quoted email in the piece. But they approach it in a couple of different ways. The first is that they really, really encourage people, hey, please, please use the appropriate model for the task. And I think we're seeing this across a bunch of companies where users are defaulting to the most recent and the most powerful and the most token hungry model for Claude or whatever that would be Opus 4.7 or 4.6 and then GPT 5.5 for OpenAI and people are using those models with reasoning as well, which is supposed to be, you know, the AI can really work through some, some problems and figure it out itself. They're using that for basic stuff, which I presume is stuff like making presentations. So we were talking about that sort of thing. So they're encouraging, pleading with employees, please stop using the powerful stuff for that. And then in other cases, according to the emails I've seen, Citi has just cut off access to a bunch of the more powerful models as well. So it started with please don't. And now it's like you can't use this unless you have some very, very specific use case, I should say. I went to Citi for comment about this and they said on background that we haven't actually closed off any access to models and we're not doing this. I then saw a screenshot where someone is unable to access the models. So I don't think that statement is fully accurate. Or maybe you need to read between the lines there. But yeah, that piece will come out around about the time this podcast is out and I'll try and get it in the show notes, but we got a bunch more detail from a bunch more companies about all of the different ways they're trying to curb this AI use, essentially because I think it's out of control for a lot of people and I think you'll probably see that in some of the stats in the story as well. All right, should we leave that there? If you're listening to the free version of the podcast, I will now play us out. But if you are a paying 404 media subscriber, we're going to talk about flowers that don't even exist. You can subscribe and gain access to that content at 404 Media co. As a reminder, 404 Media is journalist foundation and supported by subscribers. If you do wish to subscribe to 404 Media and directly support our work, please go to 404 Media co. You'll get unlimited access to our articles and an ad free version of this podcast. You'll also get to listen to the subscribers only section where we talk about a bonus story each week. This podcast is produced by Alyssa Midcalf. Another way to support us is by leaving a five star rating and review for the podcast. That stuff really, really does help us out. Or just tell a friend about us too. This has been 404 Media. We'll see you again next week.