Dynamically block IPs impersonating Googlebot via reverse lookup using fail2ban
Hello, I'm Munou.
Last time
Investigating request IPs claiming to be Google bots, including abuse - SOULMINIGRIG
Here, I confirmed that there are quite a few IP groups making requests while impersonating Google.
But how to block them? In other words, it would be fine if fail2ban could detect them and run a specified script, and I found a hint in the following issue.
apache-fakegooglebot: whitelist · Issue #1318 · fail2ban/fail2ban · GitHub
To put it simply, pick up Google bots broadly, and then perform the ignoreip determination with a specified script.
This should work.
jail.local
Add the following.
[fake-googlebot]
enabled = true
filter = fake-googlebot
port = http,https
logpath = /var/log/nginx/access.log
findtime = 1w
maxretry = 1
bantime = 99999w
ignorecommand = /usr/local/bin/check_googlebot.sh <ip>
action = pf[name=fake-googlebot]
In this case, you need to create fake-googlebot as a filter condition.
Also, the shell script set in ignorecommand is required.
filter.d/fake-googlebot.conf
Pick up broadly as follows.
[Definition]
failregex = ^<HOST> - .*"(GET|POST|HEAD|PUT|DELETE|OPTIONS|PATCH) .*" \d+ \d+ ".*" ".*Googlebot.*"$
ignoreregex =
/usr/local/bin/check_googlebot.sh
In the case of ignorecommand, it is handled as a target for banning by receiving a failure status code upon execution.
In other words, by passing the IP address as an argument and receiving the status code at execution time, you can determine whether to ban or skip.
#!/bin/sh
IP="$1"
LOG="/var/log/check_googlebot.log"
# Reverse lookup
HOST=$(getent hosts "$IP" | awk '{print $2}' | head -n1)
if [ -z "$HOST" ]; then
echo "[$(date)] DENY $IP: no PTR" >> "$LOG"
exit 1
fi
# Check if it's a Google-related domain
case "$HOST" in
*.googlebot.com|*.google.com)
;;
*)
echo "[$(date)] DENY $IP: invalid domain ($HOST)" >> "$LOG"
exit 1
;;
esac
# Forward lookup and check if it matches the original IP
MATCH=1
getent hosts "$HOST" | awk '{print $1}' | while read -r RESOLVED; do
if [ "$RESOLVED" = "$IP" ]; then
MATCH=0
break
fi
done
if getent hosts "$HOST" | awk '{print $1}' | grep -Fxq "$IP"; then
echo "[$(date)] ALLOW $IP: valid Googlebot ($HOST)" >> "$LOG"
exit 0
else
echo "[$(date)] DENY $IP: mismatch ($HOST)" >> "$LOG"
exit 1
fi
The reason for not using the host command is that it is not a universal command. In Debian-based systems, it seems to be included in bind-utils, but I retrieve the PTR record from getent hosts, which is available if glibc is installed.
[SOLVED] Host command / Newbie Corner / Arch Linux Forums
Verification
Try running it and confirm that it returns 0 for a Google bot IP.
# sh /usr/local/bin/check_googlebot.sh 66.249.74.78
# echo $?
0
What about a different IP? Let's try entering my own server's IP.
# sh /usr/local/bin/check_googlebot.sh 163.44.113.145
# echo $?
1
It seems to be judging correctly.
fail2ban
Restart to apply this filter on the fail2ban side.
service fail2ban restart
fail2ban-client status
Check the following to ensure that recent IPs haven't been mistakenly banned as Google IPs.
# fail2ban-client status fake-googlebot
Status for the jail: fake-googlebot
|- Filter
| |- Currently failed: 0
| |- Total failed: 0
| `- File list: /var/log/nginx/access.log
`- Actions
|- Currently banned: 0
|- Total banned: 0
`- Banned IP list:
Some logs from when I ran it with my own server's IP just before are still there, but it seems that Google bots were correctly identified in the logs.
# tail /var/log/check_googlebot.log
[Sun Apr 19 01:58:18 JST 2026] ALLOW 66.249.74.65: valid Googlebot (crawl-66-249-74-65.googlebot.com)
[Sun Apr 19 01:58:18 JST 2026] ALLOW 66.249.74.78: valid Googlebot (crawl-66-249-74-78.googlebot.com)
[Sun Apr 19 01:58:18 JST 2026] ALLOW 66.249.74.64: valid Googlebot (crawl-66-249-74-64.googlebot.com)
[Sun Apr 19 01:58:18 JST 2026] ALLOW 66.249.74.64: valid Googlebot (crawl-66-249-74-64.googlebot.com)
[Sun Apr 19 01:58:19 JST 2026] ALLOW 66.249.74.64: valid Googlebot (crawl-66-249-74-64.googlebot.com)[Sun Apr 19 01:58:19 JST 2026] ALLOW 66.249.74.78: valid Googlebot (crawl-66-249-74-78.googlebot.com)
[Sun Apr 19 01:58:19 JST 2026] ALLOW 66.249.74.64: valid Googlebot (crawl-66-249-74-64.googlebot.com)
[Sun Apr 19 01:58:19 JST 2026] ALLOW 66.249.74.78: valid Googlebot (crawl-66-249-74-78.googlebot.com)
[Sun Apr 19 03:56:30 JST 2026] ALLOW 66.249.74.78: valid Googlebot (crawl-66-249-74-78.googlebot.com)
[Sun Apr 19 03:58:18 JST 2026] DENY 163.44.113.145: invalid domain (v163-44-113-145.v1i0.static.cnode.jp)