Voices of Search Podcast

Episode: Blocking LLMs from Proprietary Data?
Date: April 2, 2026
Host: Tyson
Guest: Kaspar Siminsky, Senior Director at Search Brothers & former Google Search Team

Episode Overview

In this concise episode, Tyson and Kaspar Siminsky discuss the challenges and realities of keeping proprietary data away from Large Language Models (LLMs) and crawlers, especially for enterprise-level organizations. The conversation addresses the binary nature of web visibility and offers practical advice for those concerned about sensitive or proprietary content.

Key Discussion Points & Insights

The Binary Nature of Web Accessibility

Defining Proprietary Data
Kaspar opens by questioning how proprietary the data is:
- If it's truly sensitive or critical, it arguably shouldn't be available on the public web at all.
Visibility Equals Crawlability
- "If it's public, if it's accessible, it's going to get crawled." — Kaspar Siminsky (02:25)
- The web operates on a clear binary: content is either available to all, including bots and LLMs, or it's entirely restricted.
The Inevitability of Leaks
- Even with various crawling restrictions, there's always a risk:
  - "If it's crawled by some bots, chances are it's going to leak." — Kaspar Siminsky (02:25)

Practical Approach for Enterprises

Protecting Truly Proprietary Data
- The most effective approach: don’t put sensitive content online if you want to guarantee it stays off LLMs and away from crawlers.
Limits of Technical Barriers
- Robots.txt, CAPTCHAs, and other barriers can help, but they aren't foolproof against determined actors or evolving crawling strategies.

Notable Quotes & Memorable Moments

On the Hypothetical of Blocking LLMs:
- "If it's really proprietary and if it's something that we do not want to get scraped and crawled, ultimately then it shouldn't be accessible in the first place."
  — Kaspar Siminsky (00:41)
On Public Content Risks:
- "If it's public, if it's accessible, it's going to get crawled. Ultimately, it's kind of like a binary choice."
  — Kaspar Siminsky (02:25)

Important Segment Timestamps

00:22: Episode premise; introduction of guest Kaspar Siminsky—expert credentials and introduction of the topic.
00:41: Discussion begins on what it means to block LLMs from proprietary data.
02:25: Kaspar outlines the binary nature of online content and the risk of exposure if made public.

Tone & Takeaway

The conversation is frank and practical—Kaspar avoids technical jargon and opts for real-world logic: Unless you're willing to keep your proprietary data offline, you must assume that it could eventually be accessed by LLMs and crawlers. For enterprises especially, this means rethinking which assets are truly suitable for online exposure.

Bottom line: If you don't want it scraped, don’t let it be visible online—no technical fix is foolproof.

Summary At-a-Glance

If it's on the web, it may be accessed by LLMs/crawlers—there’s no middle ground.
Protect truly sensitive information by keeping it offline.
Technical solutions offer only limited protection; the best defense is not publishing.
Enterprise decision-makers must make careful, binary choices about data exposure.

For more information about Kaspar Siminsky, visit Search Brothers or check the show notes for his LinkedIn profile.

Transcript

Tyson (0:00)

The Voices of Search Podcast is a proud member of the I Hear Everything Podcast network. Looking to launch or scale your podcast, I Hear Everything delivers podcast production, growth and monetization solutions that transform your words into profit. Ready to give your brand a voice? Then visit iheareverything.com My name is Tyson

Tyson (0:22)

and joining me today is Kaspar Siminsky, Senior Director at Search Brothers and former member of Google Search Team. Today, Casper and I are going to be talking about the latest changes in the industry and how you can take that into practice, especially in the enterprise arena. What are your thoughts on blocking LLMs from proprietary data?

Kaspar Siminsky (0:41)

Well, it really depends, right? How proprietary is the data? I mean, if it's really proprietary and if it's something that we do not want to get scraped and crawled, ultimately then it shouldn't be accessible in the first place. This is a hypothetical situation.

Sponsor/Pre Visible Representative (0:58)

Time for a one minute break to hear from our sponsor, Pre Visible. So you're looking for SEO help and you got a couple of options. You could start replying to spam from agencies that claim they can get you to rank number one on Google. You can pay an hourly rate for a consultant who will inevitably nickel and dime you with hourly charges. Or you can work with a cookie cutter agency to quickly launch a strategy less project with low success rate. None of those sound very good now do they? Well, well that's where Pre Visible's integrated consulting model comes in. Pre Visible draws From a collective 40 years of SEO and digital marketing experience to unlock your organic growth opportunities. They build custom solutions that combine strategy, technical expertise, content and reporting to effectively operationalize SEO for your business. Pre Visible's four stage approach ensures that your SEO programs thrive by starting off with a strategy first approach. Then they support you in your efforts to create quality content, help you identify technical issues, and most importantly, they'll work with your cross functional teams to integrate your SEO strategies to make sure that your SEO budget actually drives results, not just your agency's bottom line. So join brands like Yelp, eBay, Canva, Atlassian Square, all who rely on the SEO consultants at Pre Visible. For more information go to Previsible IO. That's Pre Visible. P R E V I S I B L E I O

Kaspar Siminsky (2:25)

If it's public, if it's accessible, it's going to get crawled. Ultimately, it's kind of like a binary choice. If we don't want stuff for stuff to be crawled, it probably shouldn't be crawlable to begin with. If it's crawled by some bots, chances are it's going to leak.

Voices of Search Podcast

Episode: Blocking LLMs from Proprietary Data?
Date: April 2, 2026
Host: Tyson
Guest: Kaspar Siminsky, Senior Director at Search Brothers & former Google Search Team

Episode Overview

Key Discussion Points & Insights

The Binary Nature of Web Accessibility

Defining Proprietary Data
Kaspar opens by questioning how proprietary the data is:
- If it's truly sensitive or critical, it arguably shouldn't be available on the public web at all.
Visibility Equals Crawlability
- "If it's public, if it's accessible, it's going to get crawled." — Kaspar Siminsky (02:25)
- The web operates on a clear binary: content is either available to all, including bots and LLMs, or it's entirely restricted.
The Inevitability of Leaks
- Even with various crawling restrictions, there's always a risk:
  - "If it's crawled by some bots, chances are it's going to leak." — Kaspar Siminsky (02:25)

Practical Approach for Enterprises

Protecting Truly Proprietary Data
- The most effective approach: don’t put sensitive content online if you want to guarantee it stays off LLMs and away from crawlers.
Limits of Technical Barriers
- Robots.txt, CAPTCHAs, and other barriers can help, but they aren't foolproof against determined actors or evolving crawling strategies.

Notable Quotes & Memorable Moments

On the Hypothetical of Blocking LLMs:
- "If it's really proprietary and if it's something that we do not want to get scraped and crawled, ultimately then it shouldn't be accessible in the first place."
  — Kaspar Siminsky (00:41)
On Public Content Risks:
- "If it's public, if it's accessible, it's going to get crawled. Ultimately, it's kind of like a binary choice."
  — Kaspar Siminsky (02:25)

Important Segment Timestamps

00:22: Episode premise; introduction of guest Kaspar Siminsky—expert credentials and introduction of the topic.
00:41: Discussion begins on what it means to block LLMs from proprietary data.
02:25: Kaspar outlines the binary nature of online content and the risk of exposure if made public.

Tone & Takeaway

Bottom line: If you don't want it scraped, don’t let it be visible online—no technical fix is foolproof.

Summary At-a-Glance

If it's on the web, it may be accessed by LLMs/crawlers—there’s no middle ground.
Protect truly sensitive information by keeping it offline.
Technical solutions offer only limited protection; the best defense is not publishing.
Enterprise decision-makers must make careful, binary choices about data exposure.

For more information about Kaspar Siminsky, visit Search Brothers or check the show notes for his LinkedIn profile.

wavePod

Blocking LLMs from proprietary data?

Summary

Voices of Search Podcast

Episode: Blocking LLMs from Proprietary Data?
Date: April 2, 2026
Host: Tyson
Guest: Kaspar Siminsky, Senior Director at Search Brothers & former Google Search Team

Episode Overview

Key Discussion Points & Insights

The Binary Nature of Web Accessibility

Practical Approach for Enterprises

Notable Quotes & Memorable Moments

Important Segment Timestamps

Tone & Takeaway

Summary At-a-Glance

Transcript

Summary

Voices of Search Podcast

Episode: Blocking LLMs from Proprietary Data?
Date: April 2, 2026
Host: Tyson
Guest: Kaspar Siminsky, Senior Director at Search Brothers & former Google Search Team

Episode Overview

Key Discussion Points & Insights

The Binary Nature of Web Accessibility

Practical Approach for Enterprises

Notable Quotes & Memorable Moments

Important Segment Timestamps

Tone & Takeaway

Summary At-a-Glance

Blocking LLMs from proprietary data?

Summary

Voices of Search Podcast

Episode: Blocking LLMs from Proprietary Data? Date: April 2, 2026 Host: Tyson Guest: Kaspar Siminsky, Senior Director at Search Brothers & former Google Search Team

Episode Overview

Key Discussion Points & Insights

The Binary Nature of Web Accessibility

Practical Approach for Enterprises

Notable Quotes & Memorable Moments

Important Segment Timestamps

Tone & Takeaway

Summary At-a-Glance

Transcript

Summary

Voices of Search Podcast

Episode: Blocking LLMs from Proprietary Data? Date: April 2, 2026 Host: Tyson Guest: Kaspar Siminsky, Senior Director at Search Brothers & former Google Search Team

Episode Overview

Key Discussion Points & Insights

The Binary Nature of Web Accessibility

Practical Approach for Enterprises

Notable Quotes & Memorable Moments

Important Segment Timestamps

Tone & Takeaway

Summary At-a-Glance

Episode: Blocking LLMs from Proprietary Data?
Date: April 2, 2026
Host: Tyson
Guest: Kaspar Siminsky, Senior Director at Search Brothers & former Google Search Team

Episode: Blocking LLMs from Proprietary Data?
Date: April 2, 2026
Host: Tyson
Guest: Kaspar Siminsky, Senior Director at Search Brothers & former Google Search Team