It is currently possible to use robots.txt to disallow Large Language Model crawlers via user-agent strings:
User-agent: GPTBot
Disallow: /
But this approach is very broad and while it works for site administrators, it wouldn't allow users of a CMS, for example, to opt out on a per account basis.
I'm trying to understand if it's possible to use the robots meta tag, for a more granular permission, for example:
<meta name="robots" content="noindex">
Also do the LLM crawlers even use noindex as an opt-out, or is there a new meta-content-keyword to use? for example noteach, or nolearn.