全球主机交流论坛

 找回密码
 注册

QQ登录

只需一步,快速开始

CeraNetworks网络延迟测速工具IP归属甄别会员请立即修改密码
查看: 799|回复: 8

[疑问] 怎么屏蔽假的百度蜘蛛

[复制链接]
发表于 2022-1-20 08:23:11 | 显示全部楼层 |阅读模式
  1. 223.111.134.143 - - [19/Jan/2022:10:50:47 +0800] "GET /member/space/person/common/css/css.css HTTP/1.1" 404 146 "https://www.baidu.com/" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html\x09"
复制代码


很多,都是国内IP
 楼主| 发表于 2022-1-20 08:27:05 | 显示全部楼层
我尝试nginx这样写 还是无法识别
  1. if ($http_user_agent ~* "spider.html\x09" ) {
  2.         return 404;
  3.     }
复制代码
发表于 2022-1-20 08:29:36 | 显示全部楼层
$http_user_agent ~* "spider.html..09"
 楼主| 发表于 2022-1-20 08:31:04 | 显示全部楼层
domin 发表于 2022-1-20 08:29
$http_user_agent ~* "spider.html..09"

谢谢大佬我试试
 楼主| 发表于 2022-1-21 07:53:54 | 显示全部楼层
domin 发表于 2022-1-20 08:29
$http_user_agent ~* "spider.html..09"

从结果看 没匹配到
  1. 27.159.66.46 - - [21/Jan/2022:06:12:47 +0800] "GET /data/admin/allowurl.txt HTTP/1.1" 503 16559 "https://www.baidu.com/" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html\x09"
  2. 27.159.66.46 - - [21/Jan/2022:06:12:47 +0800] "GET /templets/default/style/dedecms.css HTTP/1.1" 404 146 "https://www.baidu.com/" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html\x09"
  3. 27.159.66.46 - - [21/Jan/2022:06:12:50 +0800] "GET /member/images/base.css HTTP/1.1" 404 146 "https://www.baidu.com/" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html\x09"
  4. 27.159.66.46 - - [21/Jan/2022:06:12:50 +0800] "GET /dede/templets/article_coonepage_rule.htm HTTP/1.1" 503 16559 "https://www.baidu.com/" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html\x09"
  5. 27.159.66.46 - - [21/Jan/2022:06:12:53 +0800] "GET /templets/default/style/dedecms.css HTTP/1.1" 404 146 "https://www.baidu.com/" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html\x09"
  6. 27.159.66.46 - - [21/Jan/2022:06:12:53 +0800] "GET /templets/default/style/dedecms.css HTTP/1.1" 404 146 "https://www.baidu.com/" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html\x09"
  7. 27.159.66.46 - - [21/Jan/2022:06:12:54 +0800] "GET /data/admin/ver.txt HTTP/1.1" 503 16559 "https://www.baidu.com/" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html\x09"
  8. 27.159.66.46 - - [21/Jan/2022:06:12:55 +0800] "GET /data/admin/ver.txt HTTP/1.1" 503 16559 "https://www.baidu.com/" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html\x09"
  9. 27.159.66.46 - - [21/Jan/2022:06:12:56 +0800] "GET /data/cache/index.htm HTTP/1.1" 503 16559 "https://www.baidu.com/" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html\x09"
  10. 27.159.66.46 - - [21/Jan/2022:06:12:59 +0800] "GET /data/admin/ver.txt HTTP/1.1" 503 16559 "https://www.baidu.com/" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html\x09"
  11. 27.159.66.46 - - [21/Jan/2022:06:12:59 +0800] "GET /include/data/vdcode.jpg HTTP/1.1" 302 138 "https://www.baidu.com/" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html\x09"
复制代码
发表于 2022-1-21 08:18:55 | 显示全部楼层
user-agent怎么可能防止假蜘蛛,user-agent都是可以随意更改的,要彻底识别只能DNS反解析这个IP,或者保存蜘蛛IP来判断。
比如Baiduspider的hostname以*.baidu.com或*.baidu.jp 的格式命名,非*.baidu.com或*.baidu.jp即为冒充。

host 111.206.198.69
69.198.206.111.in-addr.arpa domain name pointer baiduspider-111-206-198-69.crawl.baidu.com.
发表于 2022-1-21 08:20:44 | 显示全部楼层
你可以先user-agent判断是蜘蛛的IP保存下来,然后批量host查一下,不是的直接屏蔽
发表于 2022-1-21 08:24:06 | 显示全部楼层
利用国内大厂云的智能解析分流,然后就容易办了
 楼主| 发表于 2022-1-21 08:56:55 | 显示全部楼层
konks 发表于 2022-1-21 08:20
你可以先user-agent判断是蜘蛛的IP保存下来,然后批量host查一下,不是的直接屏蔽 ...

效率不高,并且这是国内IP
我还不如直接短暂屏蔽IP段
您需要登录后才可以回帖 登录 | 注册

本版积分规则

Archiver|手机版|小黑屋|全球主机交流论坛

GMT+8, 2024-4-27 12:40 , Processed in 0.082874 second(s), 8 queries , Gzip On, MemCache On.

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表