Do LLM crawlers respect the robots meta tag?

100 views Asked by At

It is currently possible to use robots.txt to disallow Large Language Model crawlers via user-agent strings:

User-agent: GPTBot
Disallow: /

But this approach is very broad and while it works for site administrators, it wouldn't allow users of a CMS, for example, to opt out on a per account basis.

I'm trying to understand if it's possible to use the robots meta tag, for a more granular permission, for example:

<meta name="robots" content="noindex">

Also do the LLM crawlers even use noindex as an opt-out, or is there a new meta-content-keyword to use? for example noteach, or nolearn.

0

There are 0 answers