日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

爬取斗鱼图片

發(fā)布時間:2023/12/8 编程问答 34 豆豆
生活随笔 收集整理的這篇文章主要介紹了 爬取斗鱼图片 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

創(chuàng)建項目
scrapy startproject douyu

編寫items.py

1 import scrapy 2 3 class DouyuItem(scrapy.Item): 4 nickname = scrapy.Field() 5 imagelink = scrapy.Field() 6 imagePath = scrapy.Field()

創(chuàng)建基礎(chǔ)類的爬蟲

scrapy genspider douyutupian?capi.douyucdn.cn

?

手機抓包得到API接口,返回JSON格式數(shù)據(jù)

douyutupian.py

1 import scrapy 2 from douyu.items import DouyuItem 3 import json 4 5 6 class DouyumeinvSpider(scrapy.Spider): 7 name = "douyutupian" 8 allowed_domains = ["capi.douyucdn.cn"] 9 10 offset = 0 11 url = "http://capi.douyucdn.cn/api/v1/getVerticalRoom?limit=20&offset=" 12 13 start_urls = [url + str(offset)] 14 15 def parse(self, response): 16 # 把json格式的數(shù)據(jù)轉(zhuǎn)換為python格式,data段是列表 17 data = json.loads(response.text)["data"] 18 for each in data: 19 item = DouyuItem() 20 item["nickname"] = each["nickname"] 21 item["imagelink"] = each["vertical_src"] 22 23 yield item 24 25 self.offset += 20 26 yield scrapy.Request(self.url + str(self.offset), callback = self.parse)

管道文件
pipelines.py

1 import scrapy 2 from scrapy.utils.project import get_project_settings 3 from scrapy.pipelines.images import ImagesPipeline 4 import os 5 6 class ImagesPipeline(ImagesPipeline): 7 #def process_item(self, item, spider): 8 # return item 9 # 獲取settings文件里設(shè)置的變量值 10 IMAGES_STORE = get_project_settings().get("IMAGES_STORE") 11 12 def get_media_requests(self, item, info): 13 image_url = item["imagelink"] 14 yield scrapy.Request(image_url) 15 16 def item_completed(self, result, item, info): 17 image_path = [x["path"] for ok, x in result if ok] 18 19 os.rename(self.IMAGES_STORE + "/" + image_path[0], self.IMAGES_STORE + "/" + item["nickname"] + ".jpg") 20 21 item["imagePath"] = self.IMAGES_STORE + "/" + item["nickname"] 22 23 return item

?

?

settings.py 1 BOT_NAME = 'douyu' 2 3 SPIDER_MODULES = ['douyu.spiders'] 4 NEWSPIDER_MODULE = 'douyu.spiders' 5 6 DEFAULT_REQUEST_HEADERS = { 7 "User-Agent" : "DYZB/1 CFNetwork/808.2.16 Darwin/16.3.0" 8 } 9 10 ITEM_PIPELINES = { 11 'douyu.pipelines.ImagesPipeline': 300, 12 } 13 14 IMAGES_STORE = "IMAGES_STORE = "../../Images"

?

轉(zhuǎn)載于:https://www.cnblogs.com/wanglinjie/p/9240373.html

總結(jié)

以上是生活随笔為你收集整理的爬取斗鱼图片的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。