Python3 用 urllib 下载图片非常慢，会是什么原因呢？

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

推荐学习书目

Learn Python the Hard Way

Python Sites

PyPI - Python Package Index

http://diveintopython.org/toc/index.html

Pocoo

值得关注的项目

PyPy

Celery

Jinja2

Read the Docs

gevent

pyenv

virtualenv

Sentry

Shovel

Pyflakes

pytest

Python 编程

pep8 Checker

Styles

PEP 8

Google Python Style Guide

Code Style from The Hitchhiker's Guide

这是一个创建于 3215 天前的主题，其中的信息可能已经有所发展或是发生改变。

初学者想学写个爬虫，边学边写

想要下载一张 Y 站的图片，代码为

urllib.request.urlopen('http://xxx.jpg').read()

其中 url 是可以正常访问的。图片不大，浏览器打开只需要几秒（排除缓存原因）。但在 python 中下载它却需要 30+秒，将下载到的数据写出为文件是可以正常查看的

那么问题来了，究竟是什么原因导致下载一张图片那么慢呢？

请问是还有什么地方需要配置吗？

附完整代码：

# 创建目录存放今天爬下来的图 dir_name = datetime.datetime.now().strftime('%Y%m%d') if not os.path.exists(dir_name): os.mkdir(dir_name) # info[1] 的值为 https://files.yande.re/sample/6718a8caa71a4547a417f41bc9f063bb/yande.re%20385001%20sample%20byakuya_reki%20seifuku.jpg print('开始下载……') print(info[1]) i = time.time() img = urllib.request.urlopen(info[1]).read() print('下载完毕。耗时：'+str(int(time.time() - i))+'s') # 获取文件名，并将%20 替换为空格 file_name = info[1].split('/')[-1].replace('%20', ' ') file = open(dir_name+'/'+file_name, 'wb') file.write(img) file.close() exit(200)

第 1 条附言 2017-02-27 21:52:24 +08:00

经测试，是网站对爬虫限速了
加上 UA 、 Host 、 Referer 等头信息后一切正常， XD 谢谢各位

dir_name

info

Python

17 条回复 2017-02-28 23:02:07 +08:00