We have a page with over 1000 image, we show only 10 on each page, we load them with ajax, when people "see the images", also using datatable. Everything works fine, however in Google webmaster tools, I just got thousands of 404 errors, with pages like this: http://example.com/ajax/%5C%22http:%5C/%5C/example.com%5C/image%5C/1937%5C/image-name%5C%22 Of course if I go to this page, I get a 404 error, because no page like this exists, but I don't understand then why Google fetches URLs like this. A card url looks like this: example.com/image/a 4 digit number here/image-name
As it gets loaded with ajax it creates that kind of url, which you (as a visitor) never sees but somehow Google fetches it.
Now I added /ajax to robots.txt to disallow fetching it, but I'm not sure if that's the best idea.
Any help would be appreciated.
The most likely reason is that your ajax directory (and possible other directories) is readable and lists your PHP files, which Google can access and parse for more URLs.
For example, if one of your scripts echos JSON with strings like the following, Google will find
and try to navigate to that link which resolves to
which is a 404.
You should stop http://example.com/ajax/ from displaying directory contents with either an
.htaccess
, or drop an emptyindex.html
there.You've also disallowed
/ajax
in yourrobots.txt
, so this should also work. Try both.