當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

使用爬虫框架scrapy爬取LOL英雄数据

發布時間：2023/12/15 编程问答 29 豆豆

生活随笔收集整理的這篇文章主要介紹了使用爬虫框架scrapy爬取LOL英雄数据小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Scrapy框架實戰

爬取目標：英雄聯盟所有英雄的基本信息(名字，背景故事，技能名稱及介紹)、下載所有英雄的皮膚并保存至本地

首先來到LOL官網首頁，如圖進入所有英雄的信息頁面

先說一下我最開始的思路：

通過網頁源代碼來獲取想要的數據，這也是最基本的爬取數據的方式

通過單個英雄信息的url不難發現規律，每個英雄的詳情頁url地址都一樣，只是參數id的值不一樣。

那么便可以通過在英雄信息頁獲取到每個英雄的id從而得到詳情頁地址

想象是美好的，實際操作時一直都獲取不到想要的數據，獲取的li標簽中的值一直是“正在加載中”

最后才發現這些英雄的數據都是用過ajax請求來獲取數據的，用傳統的方式肯定不行

然后我換了一種思路

直接獲取存儲英雄信息的js文件，通過js文件來獲得每一個英雄的id，然后通過拼接url來得到英雄詳情頁的地址

英雄詳情頁一樣是通過ajax獲取數據

獲取的js文件中有我們想要的數據

英雄信息、皮膚圖片地址可以直接獲取

爬取代碼:
lolheros_info.py

# -*- coding: utf-8 -*- import scrapy import json from lolheros.items import LolherosItemclass LolherosInfoSpider(scrapy.Spider):name = 'lolheros_info'allowed_domains = ['lol.qq.com']start_urls = ['https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js']def parse_heroinfo(self,response):datas = json.loads(response.body)hero_info = datas['hero']hero_nickname = hero_info['name']hero_realname = hero_info['title']hero_background = hero_info['shortBio']hero_skins = datas['skins']hero_skin_urls = []for hero_skin in hero_skins:hero_skin_url = hero_skin['mainImg']hero_skin_urls.append(hero_skin_url)hero_skills = datas['spells']hero_skills_str = ""for hero_skill in hero_skills:hero_skills_str += "("+str(hero_skill['name'])+":"+str(hero_skill['description']).replace('<br>','')+")"hero_info_list = [hero_nickname,hero_realname,hero_background,hero_skills_str]item = LolherosItem(hero_info_list=hero_info_list,hero_skin_urls=hero_skin_urls)yield itemdef parse(self, response):datas = json.loads(response.body)heros_list = datas['hero']for hero_info in heros_list:hero_id = hero_info['heroId']heroinfo_url = "https://game.gtimg.cn/images/lol/act/img/js/hero/"+hero_id+".js"request = scrapy.Request(heroinfo_url,callback=self.parse_heroinfo,dont_filter=True)yield request

數據處理代碼:
pipelines.py

# -*- coding: utf-8 -*-# Define your item pipelines here # # Don't forget to add your pipeline to the ITEM_PIPELINES setting # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html import xlwt from urllib import request import osclass LolherosPipeline(object):current_row = 1savepath = "LOL英雄信息.xls"book = xlwt.Workbook(encoding="utf-8", style_compression=0)sheet = book.add_sheet('LOL英雄信息', cell_overwrite_ok=True)def __init__(self):passdef open_spider(self,spider):print("爬取數據開始")self.image_path = os.path.join(os.path.dirname(os.path.dirname(__file__)),"images")if not os.path.exists(self.image_path):os.mkdir(self.image_path)def process_item(self, item, spider):hero_skin_urls = item['hero_skin_urls']hero_info_list = item['hero_info_list']print(hero_skin_urls)#將英雄數據保存到excelcol = ("昵稱","名字","背景故事","技能介紹")for i in range(0,4):self.sheet.write(0,i,col[i])for i in range(0,4):self.sheet.write(self.current_row,i,hero_info_list[i])self.current_row += 1self.book.save(self.savepath)# 下載英雄皮膚hero_name = hero_info_list[0]# 創建英雄名的文件夾image_category = os.path.join(self.image_path,hero_name)if not os.path.exists(image_category):os.mkdir(image_category)for hero_skin_url in hero_skin_urls:if hero_skin_url != '':image_name = hero_skin_url.split('/')[-1]request.urlretrieve(hero_skin_url,os.path.join(image_category,image_name))return itemdef close_spider(self,spider):print("爬取數據結束")

爬取結果:

所有英雄的基本信息（保存至excel）

所有英雄的皮膚圖片

總結

以上是生活随笔為你收集整理的使用爬虫框架scrapy爬取LOL英雄数据的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：解决安卓手机点击有效，苹果手机点击事件无
下一篇：荒野求生一直获取服务器信息,荒野求生各资