这是一个创建于 4195 天前的主题,其中的信息可能已经有所发展或是发生改变。
这是我截取的access log. 其中/{xxx}代表的是我网站的某个路径,其他的都是原始的log未做改动。
这个爬虫IP不固定,封了后过一会会有新的IP爬过来。
这个爬虫从大概2年前就开始爬我的站,中间我的站关掉了一年左右,现在重新开,没想到这个爬虫居然还在。不知道什么路数。很有可能我关站的这段时间他还在爬。大家给分析分析
89.248.162.170 - - [05/Aug/2014:04:42:36 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a3" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120427 Firefox/15.0a1"
89.248.162.170 - - [05/Aug/2014:04:42:36 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a2" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.66 Safari/535.11"
89.248.162.170 - - [05/Aug/2014:04:42:36 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a0" "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1"
89.248.162.170 - - [05/Aug/2014:04:42:36 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.66 Safari/535.11"
89.248.162.170 - - [05/Aug/2014:04:42:36 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a4" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.11 (KHTML, like Gecko) Ubuntu/11.10 Chromium/17.0.963.65 Chrome/17.0.963.65 Safari/535.11"
89.248.162.170 - - [05/Aug/2014:04:42:37 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a2" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.66 Safari/535.11"
89.248.162.170 - - [05/Aug/2014:04:42:37 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a9" "Mozilla/6.0 (Windows NT 6.2; WOW64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1"
89.248.162.170 - - [05/Aug/2014:04:42:37 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a3" "Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1"
94.102.49.31 - - [05/Aug/2014:04:42:56 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a6" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_4) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.65 Safari/535.11"
94.102.49.31 - - [05/Aug/2014:04:42:56 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a8" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Ubuntu/10.10 Chromium/17.0.963.65 Chrome/17.0.963.65 Safari/535.11"
94.102.49.31 - - [05/Aug/2014:04:42:57 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a1" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:15.0) Gecko/20100101 Firefox/15.0.1"
94.102.49.31 - - [05/Aug/2014:04:42:58 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a3" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Ubuntu/11.04 Chromium/17.0.963.65 Chrome/17.0.963.65 Safari/535.11"
94.102.49.31 - - [05/Aug/2014:04:42:58 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a8" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.66 Safari/535.11"
94.102.49.31 - - [05/Aug/2014:04:42:58 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a1" "Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1"
94.102.49.31 - - [05/Aug/2014:04:42:58 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a7" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.66 Safari/535.11"
94.102.49.31 - - [05/Aug/2014:04:42:58 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a3" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Ubuntu/11.10 Chromium/17.0.963.65 Chrome/17.0.963.65 Safari/535.11"
94.102.49.31 - - [05/Aug/2014:04:43:00 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a1" "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1"
94.102.49.31 - - [05/Aug/2014:04:43:01 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a2" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.16) Gecko/20120427 Firefox/15.0a1"
15 条回复 2014-08-05 15:25:55 +08:00  | | 2 sanp 2014 年 8 月 5 日 @ liangdi 刚才按了Enter居然自动发布了。。。我刚编辑了。 |
 | | 3 plprapper 2014 年 8 月 5 日 这哪里是爬虫, 简直是癞皮狗。。。。 |
 | | 4 liangdi 2014 年 8 月 5 日 是采集器吧 lz什么站? |
 | | 5 sintrb 2014 年 8 月 5 日 这爬虫好可怜。。 |
 | | 8 ChanneW 2014 年 8 月 5 日 怎么看出不是真 google 的 |
 | | 11 sanp 2014 年 8 月 5 日 @ liangdi 一个工具类的站,查询数据的,对方是遍历抓取的。我奇怪的是我站都关了一年多。重新开了,他居然还在。 |
 | | 12 sanp 2014 年 8 月 5 日 |
 | | 13 sanp 2014 年 8 月 5 日 @ plprapper 确实,一般的爬虫禁了就行了,这个是禁了吗,过会就有别的IP过来,而且抓取很频繁,基本不停的爬。 |