生活随笔
收集整理的這篇文章主要介紹了
python自动下载小说
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
大家好呀,空虛且漫長的三天小長假終于過去了,,哈哈哈哈,又是一個愉快的周一
剛到公司還沒坐下,我旁邊的IOS同學就悄悄告訴我項目出了BUG,并給我投來了一個神秘的微笑。。。
在我吃完早餐,喝完開水,上完廁所之后,手終于沒那么抖了,慢慢的打開電腦,才發現只是一個小問題。哈哈哈哈,花費一分鐘解決。哎喲,可把我牛逼壞了
旁邊的IOS同學湊過來,用他那不太飄準的普通發給我說:“兄die,上次那個爬圖片的很好用啊,不過我這幾天看圖片看太多了,靈感倒是有很多,就是身體有點吃不消。最近迷上了看小說,可是正版的沖不起錢,盜版的廣告又太多,你給我解決解決?”
我:“作為一個優雅的社畜,怎么能看盜版呢?朕瞧不起你這無恥之徒。發過來,朕先舉報(保存)一手”
半小時后,IOS同學:“兄die,怎么樣了,解決了嗎”
我心想:臥槽,看入迷了,還沒開始寫,怎么辦。
表面穩如老狗,答曰:“emmmmm,這個比較有技術難度,稍等一下,就快好了”
當然,小說要看,問題還是要解決。怎么才能只看自己想看的內容,去掉自己不想看的內容呢?心念電轉,還是用爬蟲吧,內容都拿下來,只保存自己想看的不就行了。說干就干。。。
書名《全職法師》,url在代碼里
頁面大概長這樣:
爬蟲三連:獲取網頁,解析網頁,保存目標
隨手寫個第一版,將內容保存在文件里面,以title做文件名:
import queue
import requests
from lxml
import etree
as et
import re
import random
import time
import os
headers
= {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}USER_AGENT_LIST
= ['MSIE (MSIE 6.0; X11; Linux; i686) Opera 7.23','Opera/9.20 (Macintosh; Intel Mac OS X; U; en)','Opera/9.0 (Macintosh; PPC Mac OS X; U; en)','iTunes/9.0.3 (Macintosh; U; Intel Mac OS X 10_6_2; en-ca)','Mozilla/4.76 [en_jp] (X11; U; SunOS 5.8 sun4u)','iTunes/4.2 (Macintosh; U; PPC Mac OS X 10.2)','Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:5.0) Gecko/20100101 Firefox/5.0','Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0) Gecko/20100101 Firefox/9.0','Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20120813 Firefox/16.0','Mozilla/4.77 [en] (X11; I; IRIX;64 6.5 IP30)','Mozilla/4.8 [en] (X11; U; SunOS; 5.7 sun4u)'
]
def save_file(dir,filename
,content
):if not os
.path
.exists
(dir):os
.makedirs
(dir)save_dir
= dir+'/'+filename
+'.txt'with open(save_dir
, 'w',encoding
='utf-8') as f
:f
.write
(content
)print('ok')def get_chapter_url(list_url
, base_url
, queue
):response
= requests
.get
(url
=list_url
, headers
=headers
)code
= response
.status_code
if code
== 200:html
= et
.HTML
(response
.content
)chapter_url
= html
.xpath
('//*[@id="list"]/dl/dd/a/@href')[9:40]k
= 1for i
in chapter_url
:page_url
= base_url
+ iqueue_element
= page_url
, str(k
)queue
.put
(queue_element
)k
= k
+ 1def get_detail_html(queue
):while not queue
.empty
():time_num
= 5time
.sleep
(time_num
)queue_element
= queue
.get
()queue
.task_done
()page_url
= queue_element
[0]chapter_num
= queue_element
[1]headers
= {'User-Agent': random
.choice
(USER_AGENT_LIST
)}response
= requests
.get
(url
=page_url
, headers
=headers
)response
.encoding
= "utf-8"code
= response
.status_code
if code
== 200:html
= et
.HTML
(response
.content
)title
= html
.xpath
('//h1/text()')[0]r
= html
.xpath
('//*[@id="content"]/text()')content
= ''for i
in r
:content
= content
+ ititle
= title
.strip
()content
= content
.strip
()save_file
(save_dir
, title
, content
)else:print(code
)print(title
)
if __name__
== "__main__":base_url
= 'https://www.biqugecom.com'list_url
= 'https://www.biqugecom.com/0/15/'save_dir
= os
.path
.abspath
('../quanzhifashi/')urls_queue
= queue
.Queue
()get_chapter_url
(list_url
,base_url
,urls_queue
)get_detail_html
(urls_queue
)print('the end!')
略一運行:
保存下來的小說:
過了一會兒,IOS:“老哥,你這個txt讓我很為難啊,用戶體驗不怎么的,優化一下?”
沉思五秒后,我:“那我給你寫成接口,你自己隨便寫一個APP,調我的接口怎么樣?這樣的話,界面你可以自己DIY,還能根據你的內褲顏色換小說背景色。”
IOS:“這個好,這個好,雖然我不怎么穿內褲,嘿嘿”
我:“。。。。。。。。。。。”
niubi要吹,代碼還是要寫。數據庫那么多,選什么牌子呢?數據庫技術哪家強,mysql數據庫幫你忙,好用開源是真香
先安裝一個mysql驅動吧:pip3 install mysql-connector
建個表保存小說章節編號,title和正文內容:
;
DROP TABLE IF EXISTS `novel
`;
CREATE TABLE `novel
` (`id
` int(11) NOT NULL AUTO_INCREMENT,`chapter_num
` int(11) DEFAULT NULL COMMENT '章節編號',`title
` varchar(20) NOT NULL DEFAULT '0' COMMENT '章節標題',`content
` varchar(15000) DEFAULT NULL COMMENT '章節內容',`created_at
` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,`updated_at
` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,PRIMARY KEY (`id
`)
) ENGINE=InnoDB AUTO_INCREMENT=112 DEFAULT CHARSET=utf8
;
然后寫個方法存數據,抱著試一試的心態,點擊運行,得如下結果:
打開mysql一看。
呵,我可真是天選之子
完整代碼:
import queue
import requests
from lxml
import etree
as et
import re
import random
import time
import os
import mysql
.connector
headers
= {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}
USER_AGENT_LIST
= ['MSIE (MSIE 6.0; X11; Linux; i686) Opera 7.23','Opera/9.20 (Macintosh; Intel Mac OS X; U; en)','Opera/9.0 (Macintosh; PPC Mac OS X; U; en)','iTunes/9.0.3 (Macintosh; U; Intel Mac OS X 10_6_2; en-ca)','Mozilla/4.76 [en_jp] (X11; U; SunOS 5.8 sun4u)','iTunes/4.2 (Macintosh; U; PPC Mac OS X 10.2)','Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:5.0) Gecko/20100101 Firefox/5.0','Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0) Gecko/20100101 Firefox/9.0','Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20120813 Firefox/16.0','Mozilla/4.77 [en] (X11; I; IRIX;64 6.5 IP30)','Mozilla/4.8 [en] (X11; U; SunOS; 5.7 sun4u)'
]
def save_file(dir,filename
,content
):if not os
.path
.exists
(dir):os
.makedirs
(dir)save_dir
= dir+'/'+filename
+'.txt'with open(save_dir
, 'w',encoding
='utf-8') as f
:f
.write
(content
)print('ok')
def insert_data(chapter_num
, title
, content
):conn
= mysql
.connector
.connect
(user
='root', password
='root', database
='test')cursor
= conn
.cursor
()try:cursor
.execute
('insert into novel (chapter_num, title, content) values (%s, %s, %s)', [chapter_num
, title
, content
])conn
.commit
()except Exception
as e
:print('Error:', e
)finally:cursor
.close
()conn
.close
()def get_chapter_url(list_url
, base_url
, queue
):response
= requests
.get
(url
=list_url
, headers
=headers
)code
= response
.status_code
if code
== 200:html
= et
.HTML
(response
.content
)chapter_url
= html
.xpath
('//*[@id="list"]/dl/dd/a/@href')[9:40]k
= 1for i
in chapter_url
:page_url
= base_url
+ iqueue_element
= page_url
, str(k
)queue
.put
(queue_element
)k
= k
+ 1
def get_detail_html(queue
):while not queue
.empty
():time_num
= 5time
.sleep
(time_num
)queue_element
= queue
.get
()queue
.task_done
()page_url
= queue_element
[0]chapter_num
= queue_element
[1]headers
= {'User-Agent': random
.choice
(USER_AGENT_LIST
)}response
= requests
.get
(url
=page_url
, headers
=headers
)response
.encoding
= "utf-8"code
= response
.status_code
if code
== 200:html
= et
.HTML
(response
.content
)title
= html
.xpath
('//h1/text()')[0]r
= html
.xpath
('//*[@id="content"]/text()')content
= ''for i
in r
:content
= content
+ ititle
= title
.strip
()content
= content
.strip
()insert_data
(chapter_num
, title
, content
)else:print(code
)print(title
)
if __name__
== "__main__":base_url
= 'https://www.biqugecom.com'list_url
= 'https://www.biqugecom.com/0/15/'save_dir
= os
.path
.abspath
('../quanzhifashi/')urls_queue
= queue
.Queue
()get_chapter_url
(list_url
,base_url
,urls_queue
)get_detail_html
(urls_queue
)print('the end!')
又過了一會兒,IOS同學:“兄die,你這個有點慢啊,下載一章還要休息幾秒”
我:“因為這個網站有限制,太快了會503,再說就算你一目十行也看不了那么快吧”
IOS:“看小說倒是不影響,就是你這樣的代碼影響我吹牛逼”
我:“emmmmm,那我搭建個IP池,然后給你改成多線程”
當朕準備搭建IP池的時候,產品經理的頭像突然閃了起來。
只見消息列表里面躺著一個需求文檔,并附文:
emmmmmm,溜了溜了。
The end !
學到了就要教人,賺到了就要給人
薪火相傳,方能生生不息
總結
以上是生活随笔為你收集整理的python自动下载小说的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。