Sitemap URL not defined in robots.txt Websiteplanet

57 views Asked by At

This is what my robots.txt looks like:

User-Agent: *
Disallow: /info/privacy
Disallow: /info/cookies
Allow: /
Allow : /search/Jaén
Allow : /search/Tarragona
Allow : /search/Rioja
Sitemap: https://www.alquileres.xyz/sitemap.xml

but when I check the sitemap on websiteplanet, it says Sitemap URL not defined in robots.txt Sitemap not return correct Content-type header. I checked in Postman, is application/xml, I don't know if it has to be another content-type

Also. Isn't redundant to specify other URLs when I insert Allow: /? My sitemap eventually is going to be massive, does my robots need to be as well?

I'm working with NextJs

I've tried running both the sitemap and robots in ts files as NextJs offers with its Metadata Object Type. I've also tried inserting the plain files in the app directory

1

There are 1 answers

2
Stephen Ostermiller On

The Content-Type for robots.txt should be text/plain. Maybe this would be helpful for figuring out how to change it: Changing the header of the response on NodeJS using request

The default assumption that bots make is that everything on your site is crawlable. There is no need to include Allow rules unless they disagree with Disallow rules. In fact, in the original robots.txt spec, there was no Allow directive at all.

Allowing /search URLs worries me. If those are site-search results pages, that could be horrible for SEO. Search engines don't like sending users from their search results only to land on other search results. They consider it bad user experience and often penalize sites that allow their search results to be crawled and indexed. If your /Search URLs represent a database query rather than free form text entry to a search results page, I would recommend using some other word in your URL.

You should let your privacy and cookies policies get crawled. Search engines consider sites that have them to be higher quality. If you let them get crawled, they would rarely show up in the search results other than if somebody searches for it specifically like <your site> privacy policy.

I'd recommend you use this:

User-Agent: *
Disallow: /search
Sitemap: https://www.alquileres.xyz/sitemap.xml