Python scrapy 爬虫问题

用 scrapy 框架爬智联的招聘信息的时候报的错看不懂啊
2019-04-09 23:29:10 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2019-04-09 23:29:10 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:50132/session/b97f6963939467e28aa83493fcf91f9d/url {"url": "https://zhaopin.com", "sessionId": "b97f6963939467e28aa83493fcf91f9d"}
[7964:9720:0409/232912.471:ERROR:ssl_client_socket_impl.cc(964)] handshake failed; returned -1, SSL error code 1, net_error -100
[7964:9720:0409/232912.505:ERROR:ssl_client_socket_impl.cc(964)] handshake failed; returned -1, SSL error code 1, net_error -100
[7964:10376:0409/232913.146:ERROR:platform_sensor_reader_win.cc(242)] NOT IMPLEMENTED
2019-04-09 23:29:14 [urllib3.connectionpool] DEBUG: http://127.0.0.1:50132 "POST /session/b97f6963939467e28aa83493fcf91f9d/url HTTP/1.1" 200 72
2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:50132/session/b97f6963939467e28aa83493fcf91f9d/window_handle {"sessionId": "b97f6963939467e28aa83493fcf91f9d"}
2019-04-09 23:29:14 [urllib3.connectionpool] DEBUG: http://127.0.0.1:50132 "GET /session/b97f6963939467e28aa83493fcf91f9d/window_handle HTTP/1.1" 200 111
2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:50132/session/b97f6963939467e28aa83493fcf91f9d/element {"using": "class name", "value": "zp-search__input", "sessionId": "b97f6963939467e28aa83493fcf9
1f9d"}
2019-04-09 23:29:14 [urllib3.connectionpool] DEBUG: http://127.0.0.1:50132 "POST /session/b97f6963939467e28aa83493fcf91f9d/element HTTP/1.1" 200 102
2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request

这是代码
class JobsSpider(scrapy.Spider):
name = 'jobs'
allowed_domains = ['zhaopin.com']
start_urls = ['https://www.zhaopin.com/']

def start_requests(self):
browser = webdriver.Chrome()
browser.get("https://zhaopin.com")
windows = browser.current_window_handle
input = browser.find_element_by_class_name('zp-search__input')
input.send_keys('Python')
time.sleep(1)
button = browser.find_element_by_class_name('zp-search__btn')
button.click()
all_handles = browser.window_handles
for handle in all_handles:
if handle != windows:
browser.switch_to.window(handle)
url = browser.current_url
yield Request(url,callback = self.parse)

def parse(self, response):
le = LinkExtractor(restrict_css='div.contentpile__content__wrapper__item.clearfix')
for link in le.extract_links(response):
yield scrapy.Request(link.url,callback=self.parse_job)

def parse_job(self,response):
jobs = JobItem()
sel = response.css('div.main')
jobs['jobname'] = sel.css('hi.l.info-h3::text').extract_first()
jobs['Cname'] = sel.css('div.company 1::text').extract_first()
jobs['salary'] = sel.css('div.l.info-money strong::text').extract_first()
jobs['joblocation'] = sel.css('span.icon-address::text').extract_first()
jobs['experience'] = sel.css('div.info-three.1').xpath('(.//span)[1].text()').extract_first()
jobs['education'] =sel.css('div.info-three.1').xpath('(.//span)[2].text()').extract_first()
jobs['count'] =sel.css('div.info-three.1').xpath('(.//span)[3].text()').extract_first()
jobs['jobintro'] = sel.css('div.pos-ul').extract
yield jobs

这是不是和 cookie 有什么关系啊求各位大佬解答

Supplement 1 Apr 10, 2019

class JobsSpider(scrapy.Spider): name = 'jobs' allowed_domains = ['zhaopin.com'] start_urls = ['https://www.zhaopin.com/']

def start_requests(self): browser = webdriver.Chrome() browser.get("https://zhaopin.com") windows = browser.current_window_handle input = browser.find_element_by_class_name('zp-search__input') input.send_keys('Python') time.sleep(1) button = browser.find_element_by_class_name('zp-search__btn') button.click() all_handles = browser.window_handles for handle in all_handles: if handle != windows: browser.switch_to.window(handle) url = browser.current_url yield Request(url,callback = self.parse) def parse(self, response): le = LinkExtractor(restrict_css='div.contentpile__content__wrapper__item.clearfix') for link in le.extract_links(response): yield scrapy.Request(link.url,callback=self.parse_job) def parse_job(self,response): jobs = JobItem() sel = response.css('div.main') jobs['jobname'] = sel.css('hi.l.info-h3::text').extract_first() jobs['Cname'] = sel.css('div.company 1::text'.extract_first() jobs['salary'] = sel.css('div.l.info-money strong::text').extract_first() jobs['joblocation'] = sel.css('span.icon-address::text').extract_first() jobs['experience'] = sel.css('div.info-three.1').xpath('(.//span)[1].text()').extract_first() jobs['education'] =sel.css('div.info-three.1').xpath('(.//span)[2].text()').extract_first() jobs['count'] =sel.css('div.info-three.1').xpath('(.//span)[3].text()').extract_first() jobs['jobintro'] = sel.css('div.pos-ul').extract yield jobs

3 replies

huisezhiyin

Apr 10, 2019

你这个代码格式贴的让人很难看得懂啊

idotfish

Apr 10, 2019

@huisezhiyin 不好意思，刚刚入门 python，不太懂这些东西，把代码直接截图出来可以吗

huisezhiyin

Apr 10, 2019

@idotfish 你这随便搜一下 ERROR 就有答案啊
随便搜一下 error:ssl_client_socket_impl.cc(964)] handshake failed
stack overflow 上的一个答案
https://stackoverflow.com/questions/37883759/errorssl-client-socket-openssl-cc1158-handshake-failed-with-chromedriver-chr
不行的话就试试其他的答案