DoS attack from google ip range

2.7k views Asked by At

I believe I've been attacked with multiples request (5/sec all day long) from google ip range (66.249.65.* - maybe a ip spoofing??). This requests have googlebot signature (Googlebot/2.1; +http://www.google.com/bot.html) on http header, but it try to get an old url (I desactivate it, cause it have been consume a lot of cpu/$). If I put this ip range on black list, I block the legit googlebot too :( .

And the irony: My app (http://expoonews.com) is hosted by google app engine service!

How can I stop this behavior without block google bot?

Below a sample of my log to better understand.

 A 2014-11-25 19:41:19.145 404 234 B 10ms /AddPageAction?url=http%3A%2F%2Flincoln.pioneer.kohalibrary.com%2Fcgi-bin%2Fkoha%2Fopac-search.pl%3Fidx%3Disbn%26q%3D1842172131%26do%3DSearch
66.249.65.82 - - [25/Nov/2014:13:41:19 -0800] "GET /AddPageAction?url=http%3A%2F%2Flincoln.pioneer.kohalibrary.com%2Fcgi-bin%2Fkoha%2Fopac-search.pl%3Fidx%3Disbn%26q%3D1842172131%26do%3DSearch HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=10 cpu_ms=0 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:19.550 404 234 B 11ms /AddPageAction?url=http%3A%2F%2Fwww.dnevniavaz.ba%2Fkultura%2Ffilm%2Fprica-o-hapsenju-ratnog-zlocinca
66.249.65.86 - - [25/Nov/2014:13:41:19 -0800] "GET /AddPageAction?url=http%3A%2F%2Fwww.dnevniavaz.ba%2Fkultura%2Ffilm%2Fprica-o-hapsenju-ratnog-zlocinca HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=11 cpu_ms=23 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:19.956 404 234 B 12ms /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FNewcastle_Local_Municipality
66.249.65.78 - - [25/Nov/2014:13:41:19 -0800] "GET /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FNewcastle_Local_Municipality HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=12 cpu_ms=0 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:20.426 404 234 B 10ms /AddPageAction?url=http%3A%2F%2Ftools.wmflabs.org%2Fgeohack%2Fgeohack.php%3Fpagename%3DRio_Grande_County%252C_Colorado%26params%3D37.61_N_-106.39_E_type%3Aadm2nd_region%3AUS-CO_source%3AUScensus1990
66.249.65.86 - - [25/Nov/2014:13:41:20 -0800] "GET /AddPageAction?url=http%3A%2F%2Ftools.wmflabs.org%2Fgeohack%2Fgeohack.php%3Fpagename%3DRio_Grande_County%252C_Colorado%26params%3D37.61_N_-106.39_E_type%3Aadm2nd_region%3AUS-CO_source%3AUScensus1990 HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=10 cpu_ms=23 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:20.763 404 234 B 11ms /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2F%23cite_ref-Istanbul_43-1
66.249.65.86 - - [25/Nov/2014:13:41:20 -0800] "GET /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2F%23cite_ref-Istanbul_43-1 HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=11 cpu_ms=0 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:21.166 404 234 B 10ms /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Findex.php%3Ftitle%3DHMAS%2520Pirie%26action%3Dhistory
66.249.65.86 - - [25/Nov/2014:13:41:21 -0800] "GET /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Findex.php%3Ftitle%3DHMAS%2520Pirie%26action%3Dhistory HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=10 cpu_ms=0 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:21.571 404 234 B 11ms /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Findex.php%3Ftitle%3DUniversity_of_Engineering_and_Technology_Taxila_Chakwal_Campus_University_of_Engineering_and_Technology_Taxila_Chakwal_Campus%26action%3Dedit%26redlink%3D1
66.249.65.78 - - [25/Nov/2014:13:41:21 -0800] "GET /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Findex.php%3Ftitle%3DUniversity_of_Engineering_and_Technology_Taxila_Chakwal_Campus_University_of_Engineering_and_Technology_Taxila_Chakwal_Campus%26action%3Dedit%26redlink%3D1 HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=11 cpu_ms=23 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16 
6

There are 6 answers

0
Fulvius On BEST ANSWER

I think I solved the problem by removing the url receiving parameters (url to another page).

I think the bot tries to figure out which web url is open to forge access to a particular site (for inflating the amount access, perhaps). My url was clearly exposed (it was only pass the address was a GET at the same time).

But thanks for the answers guys.

0
HayatoY On

You should write robots.txt at least to block genuine googlebot to access old urls, they try to access indexed urls frequently until the url returns 404 or any other ways to be marked as deleted.

I am not sure if it is really a fake bot, because googlebot itself performs like a spam, access pages too many in short period.

To reduce a number of access from googlebot(fake or genuine), how about like this?

#allows access 100times/m
dos_n = memcache.get(key=bot_ip)
if dos_n != None:
    if dos_n>100:
        self.abort(400)
    dos_n = memcache.incr(bot_ip)
else:
    memcache.add(key= bot_ip, value=0, time=60)

and just for information, if the host is not on gae, you could change crawl frequency in webmaster tool. https://www.google.com/webmasters/tools/

1
mattd On

You can try to disallow that specific directory or page using robots.txt http://www.robotstxt.org/robotstxt.html

2
Paul Collingwood On

A dos.yaml file in the root directory of your application (alongside app.yaml) configures DoS Protection Service blacklists for your application. The following is an example dos.yaml file:

blacklist:
- subnet: 1.2.3.4   description: a single IP address
- subnet: 1.2.3.4/24   description: an IPv4 subnet
- subnet: abcd::123:4567   description: an IPv6 address
- subnet: abcd::123:4567/48   description: an IPv6 subnet

https://cloud.google.com/appengine/docs/python/config/dos

0
R V Shiva Kumar On

It seems that Googlebot is picking up the injections which have been stored either on your website itself or some other attacker who has hardcoded these URL's in his site and is launching the attack using Googlebots.

A Web Application Firewall can be a good solution for you which can detect these signatures and deny such requests explicitly

Look out for Apache-ModSecurity or Nginx NAXSI in Google !

0
farhad sabahi On

This suspicious function related googleBot web crawling on your URL, If you’ve recently added or made changes to a page on your site, you can ask Google to (re)index it using the Fetch as Google tool.