I believe I've been attacked with multiples request (5/sec all day long) from google ip range (66.249.65.* - maybe a ip spoofing??). This requests have googlebot signature (Googlebot/2.1; +http://www.google.com/bot.html) on http header, but it try to get an old url (I desactivate it, cause it have been consume a lot of cpu/$). If I put this ip range on black list, I block the legit googlebot too :( .
And the irony: My app (http://expoonews.com) is hosted by google app engine service!
How can I stop this behavior without block google bot?
Below a sample of my log to better understand.
A 2014-11-25 19:41:19.145 404 234 B 10ms /AddPageAction?url=http%3A%2F%2Flincoln.pioneer.kohalibrary.com%2Fcgi-bin%2Fkoha%2Fopac-search.pl%3Fidx%3Disbn%26q%3D1842172131%26do%3DSearch
66.249.65.82 - - [25/Nov/2014:13:41:19 -0800] "GET /AddPageAction?url=http%3A%2F%2Flincoln.pioneer.kohalibrary.com%2Fcgi-bin%2Fkoha%2Fopac-search.pl%3Fidx%3Disbn%26q%3D1842172131%26do%3DSearch HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=10 cpu_ms=0 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16
A 2014-11-25 19:41:19.550 404 234 B 11ms /AddPageAction?url=http%3A%2F%2Fwww.dnevniavaz.ba%2Fkultura%2Ffilm%2Fprica-o-hapsenju-ratnog-zlocinca
66.249.65.86 - - [25/Nov/2014:13:41:19 -0800] "GET /AddPageAction?url=http%3A%2F%2Fwww.dnevniavaz.ba%2Fkultura%2Ffilm%2Fprica-o-hapsenju-ratnog-zlocinca HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=11 cpu_ms=23 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16
A 2014-11-25 19:41:19.956 404 234 B 12ms /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FNewcastle_Local_Municipality
66.249.65.78 - - [25/Nov/2014:13:41:19 -0800] "GET /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FNewcastle_Local_Municipality HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=12 cpu_ms=0 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16
A 2014-11-25 19:41:20.426 404 234 B 10ms /AddPageAction?url=http%3A%2F%2Ftools.wmflabs.org%2Fgeohack%2Fgeohack.php%3Fpagename%3DRio_Grande_County%252C_Colorado%26params%3D37.61_N_-106.39_E_type%3Aadm2nd_region%3AUS-CO_source%3AUScensus1990
66.249.65.86 - - [25/Nov/2014:13:41:20 -0800] "GET /AddPageAction?url=http%3A%2F%2Ftools.wmflabs.org%2Fgeohack%2Fgeohack.php%3Fpagename%3DRio_Grande_County%252C_Colorado%26params%3D37.61_N_-106.39_E_type%3Aadm2nd_region%3AUS-CO_source%3AUScensus1990 HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=10 cpu_ms=23 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16
A 2014-11-25 19:41:20.763 404 234 B 11ms /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2F%23cite_ref-Istanbul_43-1
66.249.65.86 - - [25/Nov/2014:13:41:20 -0800] "GET /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2F%23cite_ref-Istanbul_43-1 HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=11 cpu_ms=0 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16
A 2014-11-25 19:41:21.166 404 234 B 10ms /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Findex.php%3Ftitle%3DHMAS%2520Pirie%26action%3Dhistory
66.249.65.86 - - [25/Nov/2014:13:41:21 -0800] "GET /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Findex.php%3Ftitle%3DHMAS%2520Pirie%26action%3Dhistory HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=10 cpu_ms=0 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16
A 2014-11-25 19:41:21.571 404 234 B 11ms /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Findex.php%3Ftitle%3DUniversity_of_Engineering_and_Technology_Taxila_Chakwal_Campus_University_of_Engineering_and_Technology_Taxila_Chakwal_Campus%26action%3Dedit%26redlink%3D1
66.249.65.78 - - [25/Nov/2014:13:41:21 -0800] "GET /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Findex.php%3Ftitle%3DUniversity_of_Engineering_and_Technology_Taxila_Chakwal_Campus_University_of_Engineering_and_Technology_Taxila_Chakwal_Campus%26action%3Dedit%26redlink%3D1 HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=11 cpu_ms=23 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16
I think I solved the problem by removing the url receiving parameters (url to another page).
I think the bot tries to figure out which web url is open to forge access to a particular site (for inflating the amount access, perhaps). My url was clearly exposed (it was only pass the address was a GET at the same time).
But thanks for the answers guys.