How to block "bot*" bot via .htaccess

2.6k views Asked by At

I have the following entry in my Awstats file:

Unknown robot (identified by 'bot*')

How can I block this bot?
I tried the following separately but none of them seems to be catching it:

RewriteCond %{HTTP_USER_AGENT} ^bot* 

RewriteCond %{HTTP_USER_AGENT} bot\* 

RewriteCond %{HTTP_USER_AGENT} bot[*]

Here is the full .htaccess code I am using:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^bot*
RewriteRule .? - [F,L]

Tested three regex values (^bot*, bot\*, bot[*]) in the second line, none of them stopped the bot.

2

There are 2 answers

4
Mike Rockétt On

The asterisk (*) is not literal. AWStats is simply stating that it used that particular rule to check if the request was being made by a bot. In your case, bot* means that the user agent string started with bot, and it found a match.

As the asterisk is not literal, you can use the following instead:

RewriteCond %{HTTP_USER_AGENT} ^bot [OR]  # matches bot* (the same as ^bot.*$)
RewriteCond %{HTTP_USER_AGENT} bot$       # matches *bot (the same as ^.*bot$)

Note: I should say here that it is better to check your access logs to see exactly what these user agents are and block them specifically. You don't want to find yourself in a position whereby you are blocking bots that you might want.


Recommendation: Change your rule from RewriteRule .? - [F,L] to RewriteRule ^ - [F,L]

0
Giritharan V On

We can block a bots using the bot exact name inside the .htaccess file. Below example definitely will help you, currently i am using the same setup, its saving my server resource.

SetEnvIfNoCase User-Agent "Yandex" bad_bot    
SetEnvIfNoCase User-Agent "AhrefsBot" bad_bot    
SetEnvIfNoCase User-Agent "MJ12bot" bad_bot

<IfModule mod_authz_core.c>
 <Limit GET POST>
  <RequireAll>
   Require all granted
   Require not env bad_bot
  </RequireAll>
 </Limit>
</IfModule>

Let me know if you have any queries.