當前位置：首頁 > 编程语言 > python >内容正文

python

python爬取起点中文网小说

發(fā)布時間：2023/12/14 python 28 豆豆

生活随笔收集整理的這篇文章主要介紹了 python爬取起点中文网小说小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

python爬取起點中文網(wǎng)小說

完整代碼：

import requests from lxml import etree header = {'User-Agent':'Mozilla/5.0(Macintosh;Inter Mac OS X 10_13_3) AppleWebkit/537.36 (KHTML,like Gecko)''Chrom/65.0.3325.162 Safari/537.36'} def getbookurls():url = 'https://book.qidian.com/info/1017125042#Catalog'#獲取頁面源代碼charptes = requests.get(url,headers = header).text#print(charptes)objects = etree.HTML(charptes)#print(objects)#章節(jié)鏈接 //匹配所有objs = objects.xpath('//ul[@class="cf"]/li')clist = []for obj in objs:try:#章節(jié)的url地址chapt_urls = obj.xpath('a/@href')[0]#章節(jié)的名稱chapt_names = obj.xpath('a/text()')[0]into = {'chapt_urls':'https:'+ chapt_urls,'chapt_names':chapt_names}clist.append(into)except:passreturn clistclist = getbookurls()#獲取章節(jié)小說內(nèi)容 def getcontent(url):res = requests.get(url,headers = header).textobjects = etree.HTML(res)objs = objects.xpath('//div[@class="read-content j_readContent"]/p/text()')content = []for i in objs:# 替換之前的替換之后的text = i.replace('\u3000\u3000','')content.append(text)return content#下載小說 for i in clist:chapt_urls = i['chapt_urls']chapt_names = i['chapt_names']content = getcontent(chapt_urls)text = ''for j in content:text = text + jprint("正在下載%s"%chapt_names)#保存路徑，按照自己的進行更改with open('起點小說/%s.doc'%chapt_names,'w') as f:f.write(text)

總結(jié)

以上是生活随笔為你收集整理的python爬取起点中文网小说的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：颜色对照表
下一篇： websocket python爬虫_p