生活随笔
收集整理的這篇文章主要介紹了
python爬虫 爬取简历模板
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
簡(jiǎn)介:爬取“個(gè)人簡(jiǎn)歷網(wǎng)”中的簡(jiǎn)歷模板并存儲(chǔ)到本地(http://www.gerenjianli.com/moban/index.html)
代碼:
import requests
from lxml
import etree
import os
if __name__
== '__main__':headers
= {'User-Agent': '這里放自己瀏覽器的UA就行啦'}if not os
.path
.exists
('./resumeLibs'):os
.mkdir
('./resumeLibs')for pagenum
in range(1,4):if pagenum
== 1:url
= 'http://www.gerenjianli.com/moban/index.html'else:url
= 'http://www.gerenjianli.com/moban/index_' + str(pagenum
) + '.html'response
= requests
.get
(url
=url
, headers
=headers
)page_text
= response
.texttree
= etree
.HTML
(page_text
)li_list
= tree
.xpath
('//div[@class="list_boby"]/ul[@class="prlist"]/li')for li
in li_list
:a
= li
.xpath
('./div/a/@href')[0]name
= li
.xpath
('./div/a/img/@alt')[0]name
= name
.encode
('iso-8859-1').decode
('gbk')download_text
= requests
.get
(url
=a
, headers
=headers
).texttree
= etree
.HTML
(download_text
)download_href
= tree
.xpath
('//div[@class="donwurl2"]/a/@href')[0]doc_data
= requests
.get
(url
=download_href
, headers
=headers
).contentdoc_path
= 'resumeLibs/' + name
+ '.docx'with open(doc_path
, 'wb') as fp
:fp
.write
(doc_data
)print(name
, '下載成功!')
總結(jié)
以上是生活随笔為你收集整理的python爬虫 爬取简历模板的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。