Is Googlebot indexing the links located on pages blocked by robots.txt?

147 views Asked by At

My robots.txt:

User-agent: googlebot

disallow: /xxx/y.html

y.html has lots of links like "/mmm/a.html" and "/asd/b.html".

My question, will Google index "/mmm/a.html" and "/asd/b.html"?

These links are only located in "/xxx/y.html".

1

There are 1 answers

0
unor On BEST ANSWER

Note that your robots.txt must not have line breaks in a record (i.e., between User-agent and Disallow), so it should be:

User-agent: googlebot
Disallow: /xxx/y.html

This record will disallow "googlebot" to crawl URLs whose paths start with /xxx/y.html. So it will block URLs like:

  • http://example.com/xxx/y.html
  • http://example.com/xxx/y.html.zip
  • http://example.com/xxx/y.html5
  • http://example.com/xxx/y.html/foo

This means that "googlebot" will never visit these pages. So if you have a link on one of these pages, the bot will not find it in the first place.

However, if Google learns about such a link in a different way, it will probably visit it (unless also blocked by robots.txt). Such other ways could be, for example, using tools that send statistics to Google (like Google Toolbar, Google Analytics etc.), having other pages include a link, having the link in a sitemap, submitting the link to Google, and so on ….