生活随笔
收集整理的這篇文章主要介紹了
爬取51job数据
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
1.先導(dǎo)包requests,json(我用的pycharm,如果你沒(méi)有這個(gè)包的話,他會(huì)提示你,你直接點(diǎn)擊import這個(gè)就可以,pycharm安裝教程網(wǎng)上搜)
2.代碼如下
import requests
import json
from lxml import etreeBASE_DOMAIN =
'https://search.51job.com'
HEADERS =
{'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36',
}
Recruitments =
[]def parse_page
(url
):resp = requests
.get
(url
,HEADERS
)text = resp
.content
.decode
('gbk')tree = etree
.HTML
(text
)PositionAndCompany = tree
.xpath
("//div[@class='el']//span/a/@title")Company = PositionAndCompany
[1::2
]Position = PositionAndCompany
[::2
]Workplace = tree
.xpath
("//div[@class='el']//span[@class='t3']/text()")Payroll = tree
.xpath
("//div[@class='el']//span[@class='t4']/text()")Releasetime = tree
.xpath
("//div[@class='el']//span[@class='t5']/text()")for value in zip
(Position
, Company
, Workplace
, Payroll
, Releasetime
):Position
, Company
, Workplace
, Payroll
, Releasetime = valueRecruitment =
{'職位': Position
,'公司': Company
,'工作地點(diǎn)': Workplace
,'薪資': Payroll
,'發(fā)布時(shí)間': Releasetime
,}Recruitments
.append
(Recruitment
)with open
('51job.json', 'w', encoding=
'utf-8') as fp:json
.dump
(Recruitments
, fp
, ensure_ascii=False
)def spider
():base_urls =
'https://search.51job.com/list/120200%252C010000%252C020000%252C030200%252C040000,000000,0000,00,9,99,python,2,{}.html'for x in range
(1
,51
):page_url = base_urls
.format
(x
)parse_page
(page_url
)print
('第%s頁(yè)爬取完成' % x
)def main
():spider
()if __name__ ==
'__main__':main
()
運(yùn)行結(jié)果
觸動(dòng)精靈連接不上設(shè)備這個(gè)網(wǎng)址上有解決辦https://www.smzy.com/smzy/tech29119.html但是檢查設(shè)備上的觸動(dòng)精靈服務(wù)和廣播開關(guān)是否為開啟狀態(tài)不知道設(shè)備上的觸動(dòng)精靈服務(wù)和廣播開關(guān)在哪里
總結(jié)
以上是生活随笔為你收集整理的爬取51job数据的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。