當前位置：首頁 > 编程语言 > python >内容正文

python

python爬取豆瓣读书_爬取豆瓣读书.py

發布時間：2023/12/16 python 22 豆豆

生活随笔收集整理的這篇文章主要介紹了 python爬取豆瓣读书_爬取豆瓣读书.py 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

import requests

from fake_useragent import UserAgent

from pyquery import PyQuery as pq

import csv

import time

import pymongo

import random

'''

因為CSV模塊我學的不是很好所有那個write2csv函數我就注釋掉了

因為運行就會出錯等學了 CSV在改進下于是我直接扔進數據庫了

豆瓣是個靜態網頁所有的的我們想要的數據全在一個網頁中呈現出來了

我們也不用考慮什么算法加密啥的高端玩法直接獲取網頁HTML然后提取數據就O的K了

如果學過CSV 想保存為CSV格式的可以改下write2csv函數也可以直接改下數據庫IP 扔進數據庫中

如果放進數據庫這樣查詢起來要比CSV的方便很多比如查詢評分為9.0的書籍

如果數據庫端口自己改變過在參數后面加上端口如果有name和pass也請自信加上

'''

clien=pymongo.MongoClient(host='自己的數據庫')

db=clien.Douban_reading

coll=db.text

ua=UserAgent()

def parsing(page):

URL = 'https://read.douban.com/kind/100?start={}&sort=hot&promotion_only=False&min_price=None&max_price=None&works_type=None'.format(page)

headers = {

'User-Agent': ua.random

}

sponse = requests.get(URL, headers=headers).text

doc=pq(sponse)

All=doc('.item.store-item').items()

for i in All:

#書名

Title=i.find('.title').text()

#作者

The_author=i.find('.author-item').text()

#譯者(翻譯過來的作者)

The_translator=i.find('.author-item').text()

#書的評分

Scores_of_the_book=i.find('.rating-average').text()

#多少人評價

How_many_evaluation=i.find('.ratings-link').text()

#print(How_many_evaluation)

#書的價格

The_price=i.find('.original-tag').text()

#print(The_price)

#書的簡介

Introduction_to_the=i.find('.article-desc-brief').text()

#print(Introduction_to_the)

info={}

info['書名']=Title

info['作者']=The_author

info['譯者']=The_translator

info['書的評分']=Scores_of_the_book

info['多少人評價']=How_many_evaluation

info['書的價格']=The_price

info['書的簡介']=Introduction_to_the

coll.insert_one(info)

print(info)

'''

def write2csv(page):

print('正在寫入CSV文件')

with open('豆瓣讀書熱門列表.csv','a',newline='',encoding='utf8')as f:

fieldnames=['書名','作者','譯者','書的評分','多少人評價','書的價格','書的簡介']

writer=csv.DictWriter(f,fieldnames=fieldnames)

writer.writeheader()

data=parsing(page)

writer.writerow(data)

print('寫入成功')

'''

#一共744頁

for i in range(0,744):

try:

i=i*20

parsing(i)

time.sleep(int(random.randint(0,9)))

except Exception as e:

print(e.args)

一鍵復制

編輯

Web IDE

原始數據

按行查看

歷史

總結

以上是生活随笔為你收集整理的python爬取豆瓣读书_爬取豆瓣读书.py的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Eureka入口之DiscoveryCl
下一篇： websocket python爬虫_p