調查自稱為 Google 機器人的請求 IP(包含濫用情況)
你好,我是無能。一時興起,我決定來調查一下 Google 的機器人們。
我也會調查有多少是偽裝的。
調查
這次將從 Nginx 的伺服器日誌中進行確認。
包含 Google 的 User-Agent
對象
$ ls
access.log access.log.0 access.log.1 access.log.2 access.log.3 access.log.4 access.log.5 access.log.6
$ cat ./* | wc -l
746978
$ rg Google | awk -F\" '{print $6}' | sort | uniq -c | sort -tr
1 BlackBerry7520/4.0.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/5.0.3.3 UP.Link/5.1.2.12 (Google WAP Proxy/1.0)
1 GoogleOther-Image/1.0
1 Googlebot/2.1 ( http://www.googlebot.com/bot.html)
1 Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.5266-big; Windows XP 5.1; MSIE 6.0.2900.2180)
1 Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.6099.224 Mobile Safari/537.36 (compatible; Google-InspectionTool/1.0)
1 Mozilla/5.0 (en-us) AppleWebKit/525.13 (KHTML, like Gecko; Google Web Preview) Version/3.1 Safari/525.13
2 DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; http://www.google.com/bot.html)
2 Googlebot-News
2 Googlebot-Video/1.0
2 Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.7680.177 Mobile Safari/537.36 (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)
3 FeedFetcher-Google; ( http://www.google.com/feedfetcher.html)
3 Googlebot-Image/1.0
3 Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
3 Mozilla/5.0 (iPhone; U; CPU iPhone OS) (compatible; Googlebot-Mobile/2.1; http://www.google.com/bot.html)
4 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/146.0.7680.164 Safari/537.36
5 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
10 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/88.0.4324.175 Safari/537.36
12 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/81.0.4044.108 Safari/537.36
12 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/84.0.4147.108 Safari/537.36
13 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/77.0.3865.99 Safari/537.36
13 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/89.0.4389.127 Safari/537.36
13 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/89.0.4389.93 Safari/537.36
14 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/89.0.4389.130 Safari/537.36
15 AdsBot-Google (+http://www.google.com/adsbot.html)
15 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/87.0.4280.90 Safari/537.36
17 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/84.0.4147.140 Safari/537.36
18 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/85.0.4183.122 Safari/537.36
19 Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Mobile Safari/537.36 (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943)
19 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/83.0.4103.118 Safari/537.3641 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/146.0.7680.177 Safari/537.36
104 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
4229 Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.7680.177 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
從 User-Agent 確認 IP
$ rg Google | awk -F\" '{print $1}' | awk -F: '{print $2}' | awk '{print $1}' | sort | uniq -c | sort -tr
1 102.102.83.208
1 102.209.223.8
1 102.213.69.245
1 102.96.161.212
1 103.102.136.51
1 103.117.162.160
1 103.166.41.21
1 103.174.225.53
1 103.174.4.56
1 103.182.63.251
1 103.209.196.228
1 107.175.195.195
1 109.107.230.189
1 110.36.105.66
1 113.163.105.69
1 113.164.130.84
1 113.164.179.136
1 113.165.156.186
1 113.167.114.75
1 113.167.36.210
1 113.168.178.66
1 113.170.146.57
1 113.172.134.199
1 113.172.55.98
1 113.173.131.69
1 113.176.223.121
1 113.177.75.9
1 113.179.10.132
1 113.179.81.135
1 113.181.56.153
1 113.184.109.88
1 113.187.118.187
1 113.188.199.37
1 123.16.137.129
1 123.16.20.54
1 123.17.123.74
1 123.17.241.160
1 123.21.128.173
1 123.21.154.104
1 123.21.239.192
1 123.22.157.65
1 123.22.26.73
1 123.22.47.234
1 123.23.173.28
1 123.23.53.57
1 123.26.211.219
1 123.28.244.115
1 129.222.187.142
1 135.136.22.95
1 14.161.25.181
1 14.163.205.38
1 14.163.255.222
1 14.163.74.166
1 14.165.82.24
1 14.166.9.16
1 14.168.144.185
1 14.169.148.70
1 14.171.175.167
1 14.174.245.92
1 14.174.77.121
1 14.177.105.57
1 14.178.211.59
1 14.179.10.60
1 14.183.145.50
1 14.184.50.90
1 14.185.153.242
1 14.186.117.89
1 14.186.150.189
1 14.187.218.91
1 14.187.26.243
1 14.189.179.118
1 14.189.210.53
1 14.190.2.154
1 14.191.16.78
1 14.191.17.109
1 14.191.18.234
1 14.191.182.19
1 14.191.186.94
1 14.191.75.66
1 14.227.194.148
1 14.228.237.83
1 14.228.75.26
1 14.229.179.246
1 14.233.174.253
1 14.234.144.235
1 14.234.49.104
1 14.235.208.131
1 14.237.141.33
1 14.240.58.44
1 14.241.134.193
1 14.241.175.90
1 14.247.241.23
1 14.248.205.47
1 14.254.4.140
1 143.105.146.217
1 144.124.192.237
1 146.70.132.37
1 146.70.188.201
1 149.102.246.23
1 149.40.247.184
1 154.182.113.105
1 154.249.128.19
1 154.47.29.12
1 154.47.29.7
1 156.207.113.249
1 157.100.109.38
1 158.173.46.141
1 158.173.46.150
1 163.61.227.149
1 167.250.37.1
1 168.232.162.137
1 169.224.11.74
1 176.28.246.47
1 177.245.235.193
1 179.24.245.210
1 179.24.37.219
1 181.123.202.209
1 181.177.142.132
1 181.20.0.199
1 181.225.198.251
1 182.252.75.10
1 185.209.199.101
1 185.213.154.208
1 185.213.229.201
1 185.65.135.176
1 186.19.194.205
1 186.64.216.77
1 187.18.254.35
1 188.247.203.22
1 191.113.89.3
1 191.84.214.228
1 192.178.15.37
1 192.178.15.38
1 192.178.15.39
1 192.178.6.5
1 193.138.218.201
1 193.32.127.227
1 193.32.249.175
1 193.32.249.233
1 194.127.167.114
1 196.216.60.5
1 196.64.23.93
1 197.203.188.32
1 200.150.240.219
1 213.230.112.182
1 213.230.88.168
1 213.230.93.122
1 222.252.116.30
1 222.254.102.235
1 222.254.234.2281 222.254.235.100
1 223.123.73.157
1 31.146.249.245
1 37.238.89.130
1 37.239.128.136
1 37.239.34.20
1 38.121.208.13
1 38.147.75.229
1 38.159.51.124
1 38.52.132.46
1 38.61.160.186
1 39.63.183.44
1 41.121.161.255
1 41.143.205.203
1 41.226.155.129
1 45.129.59.188
1 45.161.97.5
1 45.182.132.6
1 45.189.56.33
1 45.239.187.75
1 45.247.87.161
1 45.83.220.221
1 5.175.154.231
1 5.193.137.135
1 5.57.13.154
1 64.233.173.37
1 66.249.64.64
1 66.249.64.65
1 66.249.66.78
1 66.249.68.35
1 66.249.71.130
1 66.249.73.101
1 66.249.73.130
1 66.249.73.132
1 66.249.73.204
1 66.249.74.100
1 66.249.74.129
1 66.249.74.135
1 72.252.235.78
1 74.125.217.103
1 77.83.39.42
1 79.127.182.142
1 84.54.70.17
1 85.193.106.148
1 89.37.63.212
1 93.114.213.250
2 64.233.173.38
2 66.249.66.204
2 66.249.73.160
2 66.249.73.161
2 66.249.73.197
2 66.249.73.201
2 66.249.74.10
2 66.249.74.11
2 66.249.74.128
2 66.249.74.132
2 66.249.74.136
2 66.249.74.137
2 66.249.74.15
2 66.249.90.39
2 74.125.215.168
3 66.249.73.131
3 66.249.73.172
3 66.249.74.1
3 66.249.74.12
3 66.249.74.131
3 66.249.74.14
4 66.249.73.103
4 66.249.73.203
4 66.249.74.133
4 74.125.215.167
5 66.249.73.102
5 66.249.73.173
5 66.249.74.110
6 66.249.73.202
6 74.125.217.101
7 66.249.73.198
7 66.249.73.200
7 74.125.215.169
8 74.125.217.102
10 66.249.73.199
43 66.249.74.102
50 66.249.74.103
53 66.249.74.101
840 66.249.74.65
1243 66.249.74.64
2044 66.249.74.78
雖然對以 66 或 74 開頭的位址有一定的認識,但還是有不少來自不明 IPv4 位址的請求,特別是越南和非洲的 IP 相當多。
官方也在以下頁面公開了相關資訊。
驗證 Google 檢索器與擷取工具的要求 | Google 檢索基礎架構 | Crawling infrastructure | Google for Developers
自稱為 Googlebot 的反向解析主機
$ rg Google | awk -F\" '{print $1}' | awk -F: '{print $2}' | awk '{print $1}' | sort -u | xargs -I{} getent hosts {}
102.209.223.8 OCI-102.209.223.8.aviso.ci
103.174.225.53 ns3.virtualhostbd.net
107.175.195.195 107-175-195-195-host.colocrossing.com
110.36.105.66 GPONUser36105-66.wateen.net
113.163.105.69 static.vnpt.vn
113.164.130.84 static.vnpt.vn
113.164.179.136 static.vnpt.vn
113.165.156.186 static.vnpt.vn
113.167.114.75 static.vnpt.vn
113.167.36.210 static.vnpt.vn
113.168.178.66 static.vnpt.vn
113.170.146.57 static.vnpt.vn
113.172.134.199 static.vnpt.vn
113.172.55.98 static.vnpt.vn
113.173.131.69 static.vnpt.vn
113.176.223.121 static.vnpt.vn
113.177.75.9 static.vnpt.vn
113.179.10.132 static.vnpt.vn
113.179.81.135 static.vnpt.vn
113.181.56.153 static.vnpt.vn
113.184.109.88 static.vnpt.vn
113.187.118.187 static.vnpt.vn
113.188.199.37 static.vnpt.vn
123.16.137.129 static.vnpt.vn
123.16.20.54 static.vnpt.vn
123.17.123.74 static.vnpt.vn
123.17.241.160 static.vnpt.vn
123.26.211.219 static.vnpt.vn
123.28.244.115 localhost
129.222.187.142 customer.nrbiken1.isp.starlink.com
14.161.25.181 static.vnpt.vn
14.163.205.38 static.vnpt.vn
14.163.255.222 static.vnpt.vn
14.163.74.166 static.vnpt.vn
14.165.82.24 static.vnpt.vn
14.166.9.16 static.vnpt.vn
14.168.144.185 static.vnpt.vn
14.169.148.70 static.vnpt.vn
14.171.175.167 static.vnpt.vn
14.174.245.92 static.vnpt.vn
14.174.77.121 static.vnpt.vn
14.177.105.57 static.vnpt.vn
14.178.211.59 static.vnpt.vn
14.179.10.60 static.vnpt.vn
14.183.145.50 static.vnpt.vn14.184.50.90 static.vnpt.vn
14.185.153.242 static.vnpt.vn
14.186.117.89 static.vnpt.vn
14.186.150.189 static.vnpt.vn
14.187.218.91 static.vnpt.vn
14.187.26.243 static.vnpt.vn
14.189.179.118 static.vnpt.vn
14.189.210.53 static.vnpt.vn
14.190.2.154 static.vnpt.vn
14.191.16.78 static.vnpt.vn
14.191.17.109 static.vnpt.vn
14.191.18.234 static.vnpt.vn
14.191.182.19 static.vnpt.vn
14.191.186.94 static.vnpt.vn
14.191.75.66 static.vnpt.vn
14.227.194.148 static.vnpt.vn
14.228.237.83 static.vnpt.vn
14.228.75.26 static.vnpt.vn
14.229.179.246 static.vnpt.vn
14.233.174.253 static.vnpt.vn
14.234.144.235 static.vnpt.vn
14.234.49.104 static.vnpt.vn
14.235.208.131 static.vnpt.vn
14.237.141.33 static.vnpt.vn
14.240.58.44 static.vnpt.vn
14.241.134.193 static.vnpt.vn
14.241.175.90 static.vnpt.vn
14.247.241.23 static.vnpt.vn
14.248.205.47 static.vnpt.vn
14.254.4.140 static.vnpt.vn
143.105.146.217 customer.bgtacol1.isp.starlink.com
149.102.246.23 unn-149-102-246-23.datapacket.com
154.47.29.12 unn-154-47-29-12.datapacket.com
154.47.29.7 unn-154-47-29-7.datapacket.com
157.100.109.38 host-157-100-109-38.ecua.net.ec
167.250.37.1 167-250-37-1.ips-dinamicos.sol.com.py
168.232.162.137 168-232-162-137.static.sumicity.net.br
177.245.235.193 customer-LEON-CGN-235-193.megared.net.mx
179.24.245.210 r179-24-245-210.dialup.adsl.anteldata.net.uy
179.24.37.219 r179-24-37-219.dialup.adsl.anteldata.net.uy
181.123.202.209 pool-209-202-123-181.telecel.com.py
181.177.142.132 static-181-177-142-132.supernet.com.bo
181.20.0.199 181-20-0-199.speedy.com.ar
186.19.194.205 cpe-186-19-194-205.telecentro-reversos.com.ar
186.64.216.77 ip77-216-64-186.ct.co.cr
191.113.89.3 191-113-89-3.baf.movistar.cl
192.178.15.37 google-proxy-192-178-15-37.google.com
192.178.15.38 google-proxy-192-178-15-38.google.com
192.178.15.39 google-proxy-192-178-15-39.google.com
192.178.6.5 crawl-192-178-6-5.googlebot.com
213.230.112.182 182.64.uzpak.uz
213.230.88.168 168.64.uzpak.uz
213.230.93.122 122.64.uzpak.uz
222.252.116.30 static.vnpt.vn
222.254.102.235 static.vnpt.vn
222.254.234.228 static.vnpt.vn
222.254.235.100 static.vnpt.vn
38.121.208.13 38-121-208-13.galanet.com.ve
38.52.132.46 132-52-38-46.giganet.net.py
45.161.97.5 45-161-97-5.log.inf.br
45.182.132.6 clientes-132.6.dbug.com.br
45.189.56.33 33-56-189-45.cbvision.net.ec
45.239.187.75 45-239-187-75.cooperatelecom.com.br
5.57.13.154 5-57-13-154.elcat.kg
64.233.173.37 google-proxy-64-233-173-37.google.com
64.233.173.38 google-proxy-64-233-173-38.google.com
66.249.64.64 crawl-66-249-64-64.googlebot.com
66.249.64.65 crawl-66-249-64-65.googlebot.com
66.249.66.204 crawl-66-249-66-204.googlebot.com
66.249.66.78 crawl-66-249-66-78.googlebot.com
66.249.68.35 crawl-66-249-68-35.googlebot.com
66.249.71.130 crawl-66-249-71-130.googlebot.com
66.249.73.101 crawl-66-249-73-101.googlebot.com
66.249.73.102 crawl-66-249-73-102.googlebot.com
66.249.73.103 crawl-66-249-73-103.googlebot.com
66.249.73.130 crawl-66-249-73-130.googlebot.com
66.249.73.131 crawl-66-249-73-131.googlebot.com
66.249.73.132 crawl-66-249-73-132.googlebot.com
66.249.73.160 crawl-66-249-73-160.googlebot.com
66.249.73.161 crawl-66-249-73-161.googlebot.com
66.249.73.172 crawl-66-249-73-172.googlebot.com
66.249.73.173 crawl-66-249-73-173.googlebot.com
66.249.73.197 crawl-66-249-73-197.googlebot.com
66.249.73.198 crawl-66-249-73-198.googlebot.com
66.249.73.199 crawl-66-249-73-199.googlebot.com
66.249.73.200 crawl-66-249-73-200.googlebot.com
66.249.73.201 crawl-66-249-73-201.googlebot.com
66.249.73.202 crawl-66-249-73-202.googlebot.com
66.249.73.203 crawl-66-249-73-203.googlebot.com
66.249.73.204 crawl-66-249-73-204.googlebot.com
66.249.74.1 crawl-66-249-74-1.googlebot.com
66.249.74.10 crawl-66-249-74-10.googlebot.com
66.249.74.100 crawl-66-249-74-100.googlebot.com
66.249.74.101 crawl-66-249-74-101.googlebot.com66.249.74.102 crawl-66-249-74-102.googlebot.com
66.249.74.103 crawl-66-249-74-103.googlebot.com
66.249.74.11 crawl-66-249-74-11.googlebot.com
66.249.74.110 crawl-66-249-74-110.googlebot.com
66.249.74.12 crawl-66-249-74-12.googlebot.com
66.249.74.128 crawl-66-249-74-128.googlebot.com
66.249.74.129 crawl-66-249-74-129.googlebot.com
66.249.74.131 crawl-66-249-74-131.googlebot.com
66.249.74.132 crawl-66-249-74-132.googlebot.com
66.249.74.133 crawl-66-249-74-133.googlebot.com
66.249.74.135 crawl-66-249-74-135.googlebot.com
66.249.74.136 crawl-66-249-74-136.googlebot.com
66.249.74.137 crawl-66-249-74-137.googlebot.com
66.249.74.14 crawl-66-249-74-14.googlebot.com
66.249.74.15 crawl-66-249-74-15.googlebot.com
66.249.74.64 crawl-66-249-74-64.googlebot.com
66.249.74.65 crawl-66-249-74-65.googlebot.com
66.249.74.78 crawl-66-249-74-78.googlebot.com
66.249.90.39 rate-limited-proxy-66-249-90-39.google.com
74.125.215.167 google-proxy-74-125-215-167.google.com
74.125.215.168 google-proxy-74-125-215-168.google.com
74.125.215.169 google-proxy-74-125-215-169.google.com
74.125.217.101 rate-limited-proxy-74-125-217-101.google.com
74.125.217.102 rate-limited-proxy-74-125-217-102.google.com
74.125.217.103 rate-limited-proxy-74-125-217-103.google.com
79.127.182.142 unn-79-127-182-142.datapacket.com
Google 另外準備了限速用的 IP 呢。也難怪。
geoiplookup
$ rg Google | awk -F\" '{print $1}' | awk -F: '{print $2}' | awk '{print $1}' | sort -u | xargs -I{} geoiplookup {} | sort | uniq -c | sort -tr
1 GeoIP Country Edition: AE, United Arab Emirates
1 GeoIP Country Edition: BG, Bulgaria
1 GeoIP Country Edition: BO, Bolivia
1 GeoIP Country Edition: CH, Switzerland
1 GeoIP Country Edition: CI, Cote D'Ivoire
1 GeoIP Country Edition: CR, Costa Rica
1 GeoIP Country Edition: EE, Estonia
1 GeoIP Country Edition: ET, Ethiopia
1 GeoIP Country Edition: GB, United Kingdom
1 GeoIP Country Edition: GE, Georgia
1 GeoIP Country Edition: GR, Greece
1 GeoIP Country Edition: HU, Hungary
1 GeoIP Country Edition: JM, Jamaica
1 GeoIP Country Edition: KE, Kenya
1 GeoIP Country Edition: KG, Kyrgyzstan
1 GeoIP Country Edition: MX, Mexico
1 GeoIP Country Edition: NA, Namibia
1 GeoIP Country Edition: RO, Romania
1 GeoIP Country Edition: SY, Syrian Arab Republic
1 GeoIP Country Edition: TN, Tunisia
1 GeoIP Country Edition: UA, Ukraine
1 GeoIP Country Edition: ZA, South Africa
2 GeoIP Country Edition: CL, Chile
2 GeoIP Country Edition: DK, Denmark
2 GeoIP Country Edition: DZ, Algeria
2 GeoIP Country Edition: EC, Ecuador
2 GeoIP Country Edition: HR, Croatia
2 GeoIP Country Edition: JO, Jordan
2 GeoIP Country Edition: KZ, Kazakhstan
2 GeoIP Country Edition: NL, Netherlands
2 GeoIP Country Edition: UY, Uruguay
3 GeoIP Country Edition: EG, Egypt
3 GeoIP Country Edition: PY, Paraguay
4 GeoIP Country Edition: AR, Argentina
4 GeoIP Country Edition: MA, Morocco
5 GeoIP Country Edition: BR, Brazil
5 GeoIP Country Edition: IQ, Iraq
5 GeoIP Country Edition: VE, Venezuela
6 GeoIP Country Edition: BD, Bangladesh
6 GeoIP Country Edition: PK, Pakistan
6 GeoIP Country Edition: UZ, Uzbekistan
7 GeoIP Country Edition: SE, Sweden
61 GeoIP Country Edition: US, United States
82 GeoIP Country Edition: VN, Vietnam
看起來越南 IP 的濫用情況很多呢。
既然都冒充成已知的 User-Agent(如 Google)做到這種程度了,這群 IP 相當惡質,或許已經可以建立「如果反向解析結果不是 Google 就直接封鎖 (Ban)」的規則了。
反向解析結果即使只是 *.google.com 或 *.googlebot.com 這種粗略的判斷,感覺也已經足夠了。