日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

爬取网易云音乐评论过万歌曲

發(fā)布時(shí)間:2024/3/13 编程问答 34 豆豆
生活随笔 收集整理的這篇文章主要介紹了 爬取网易云音乐评论过万歌曲 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

看到網(wǎng)上其他同學(xué)的思路是爬取所有歌單,然后篩選出評(píng)論過(guò)萬(wàn)的歌曲。但我覺(jué)得不同歌單之間會(huì)有交叉,這種方式可能效率不高,而且可能會(huì)有漏網(wǎng)之魚(yú)。所以我準(zhǔn)備爬取所有歌手,再爬取他們的熱門(mén)50單曲,從中篩選評(píng)論過(guò)萬(wàn)的歌曲。現(xiàn)階段幾乎沒(méi)有歌手有超過(guò)50首評(píng)論過(guò)萬(wàn)的歌曲,所以這種方法目前是可行的。

查看歌手頁(yè)面,歌手被分成了華語(yǔ)男歌手、華語(yǔ)女歌手、歐美男歌手……共計(jì)15個(gè)板塊,板塊代號(hào)如下:

group = ['1001', '1002', '1003', '2001', '2002', '2003', '6001', '6002', '6003', '7001', '7002', '7003', '4001', '4002', '4003']

而每個(gè)板塊又按照首字母分成了27個(gè)子頁(yè)面(包括熱門(mén)歌手頁(yè)面),子頁(yè)面代號(hào)如下:

initial = ['0'] for i in range(65, 91):initial.append(str(i))

15*27=405,我們要爬取405個(gè)歌手子頁(yè)面,可以利用上述代號(hào)拼接出這405個(gè)歌手子頁(yè)面鏈接:

urls = [] for g in group:for i in initial:url = 'http://music.163.com/discover/artist/cat?id=' + g + '&initial=' + iurls.append(url)

然后就是用爬蟲(chóng)從這些頁(yè)面上爬取歌手的id:

def get_artist(url):k = 0t = []while True:try:resp = request.urlopen(url)html = resp.read().decode('utf-8')soup = bs(html, 'html.parser')l = soup.find_all('a', class_='nm nm-icn f-thide s-fc0')p = r'\s*\/[a-z]+\?[a-z]+\=([0-9]+)'for i in l:t.append(re.match(p, i['href']).group(1))return texcept Exception as e:print(e)k += 1if k > 10:print('頁(yè)面' + url + '發(fā)生錯(cuò)誤')return Nonet = []continue

獲得歌手id以后,再讓爬蟲(chóng)爬取歌手的個(gè)人頁(yè)面,獲取熱門(mén)50單曲的歌曲id:

def get_song(artist_id):k = 0t = []while True:url = 'http://music.163.com/artist?id=' + artist_idtry:req = request.Request(url)req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4399.400 QQBrowser/9.7.12777.400')req.add_header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8')resp = request.urlopen(req)html = resp.read().decode('utf-8')soup = bs(html, 'html.parser')except Exception as e:k += 1if k > 10:print('歌手' + artist_id + '發(fā)生錯(cuò)誤')print(e)return Nonecontinuetry:a = soup.find('ul', class_='f-hide')l = a.childrenp = r'\s*\/[a-z]+\?[a-z]+\=([0-9]+)'for i in l:music_id = re.match(p, i.a['href']).group(1)data = (music_id, artist_id)t.append(data)return texcept Exception as e:print(e)print('歌手' + artist_id + '發(fā)生錯(cuò)誤')return None

利用歌曲id訪問(wèn)歌曲頁(yè)面,獲取歌曲評(píng)論數(shù),這里遇到了難點(diǎn)。評(píng)論信息都是動(dòng)態(tài)加載的,直接獲取評(píng)論數(shù)的結(jié)果總是0,所以這里借鑒了知乎用戶平胸小仙女的回答,方法如下:

# -*- coding: utf-8 -*- from Crypto.Cipher import AES import base64 import requests import json import codecs import time import random#代理ip proxy_host = '122.72.18.35' proxy = {'http':proxy_host}# 頭部信息 headers={'Host':'music.163.com','Accept':'*/*','Accept-Language':'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3','Accept-Encoding':'gzip, deflate','Content-Type':'application/x-www-form-urlencoded','Referer':'http://music.163.com/song?id=347597','Content-Length':'484','Cookie':'__s_=1; _ntes_nnid=f17890f7160fd145486752ebbf2066df,1505221478108; _ntes_nuid=f17890f7160fd145486752ebbf2066df; JSESSIONID-WYYY=Z99pE%2BatJVOAGco1d%2FJpojOK94Xe9GHqe0epcCOj23nqP2SlHt1XwzWQ2FXTwaM2xgIN628qJGj8%2BikzfYkv%2FXAUo%2FSzwMxjdyO9oeQlGKBvH6nYoFpJpVlA%2F8eP57fkZAVEsuB9wqkVgdQc2cjIStE1vyfE6SxKAlA8r0sAgOnEun%2BV%3A1512200032388; _iuqxldmzr_=32; __utma=94650624.1642739310.1512184312.1512184312.1512184312.1; __utmc=94650624; __utmz=94650624.1512184312.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); playerid=10841206','Connection':'keep-alive','User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0'}# offset的取值為:(評(píng)論頁(yè)數(shù)-1)*20,total第一頁(yè)為true,其余頁(yè)為false first_param = '{rid:"", offset:"0", total:"true", limit:"20", csrf_token:""}' second_param = "010001" third_param = "00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7" forth_param = "0CoJUm6Qyw8W8jud"# 獲取參數(shù) def get_params(page): # page為傳入頁(yè)數(shù)iv = "0102030405060708"first_key = forth_paramsecond_key = 16 * 'F'if(page == 1): # 如果為第一頁(yè)first_param = '{rid:"", offset:"0", total:"true", limit:"20", csrf_token:""}'h_encText = AES_encrypt(first_param, first_key, iv)else:offset = str((page-1)*20)first_param = '{rid:"", offset:"%s", total:"%s", limit:"20", csrf_token:""}' %(offset,'false')h_encText = AES_encrypt(first_param, first_key, iv)h_encText = AES_encrypt(h_encText, second_key, iv)return h_encText# 獲取 encSecKey def get_encSecKey():encSecKey = "257348aecb5e556c066de214e531faadd1c55d814f9be95fd06d6bff9f4c7a41f831f6394d5a3fd2e3881736d94a02ca919d952872e7d0a50ebfa1769a7a62d512f5f1ca21aec60bc3819a9c3ffca5eca9a0dba6d6f7249b06f5965ecfff3695b54e1c28f3f624750ed39e7de08fc8493242e26dbc4484a01c76f739e135637c"return encSecKey# 解密過(guò)程 def AES_encrypt(text, key, iv):pad = 16 - len(text) % 16text = text + pad * chr(pad)encryptor = AES.new(key, AES.MODE_CBC, iv)encrypt_text = encryptor.encrypt(text)encrypt_text = base64.b64encode(encrypt_text)encrypt_text = str(encrypt_text, encoding="utf-8") #注意一定要加上這一句,沒(méi)有這一句則出現(xiàn)錯(cuò)誤return encrypt_textdef get_json(url, params, encSecKey):data = {"params": params,"encSecKey": encSecKey}response = requests.post(url, headers=headers, data=data, proxies=proxy)return response.content#外部調(diào)用方法 def get_comments_total(id):url = 'http://music.163.com/weapi/v1/resource/comments/R_SO_4_'+str(id)+'?csrf_token='params = get_params(1)encSecKey = get_encSecKey()json_text = get_json(url,params,encSecKey)json_dict = json.loads(json_text)comments_num = int(json_dict['total'])return comments_num

最后再將獲得的數(shù)據(jù)逐條寫(xiě)入數(shù)據(jù)庫(kù)就可以了
總的代碼如下:

# _*_ coding: utf-8 _*_ from urllib import request import requests import json from bs4 import BeautifulSoup as bs from Crypto.Cipher import AES import base64 import re import mysql.connector import get_comments_total as gct import threadinggroup = ['1001', '1002', '1003', '2001', '2002', '2003', '6001', '6002', '6003', '7001', '7002', '7003', '4001', '4002','4003']initial = ['0'] for i in range(65, 91):initial.append(str(i))urls = [] for g in group:for i in initial:url = 'http://music.163.com/discover/artist/cat?id=' + g + '&initial=' + iurls.append(url)#寫(xiě)入數(shù)據(jù)庫(kù) def write(L):try:conn = mysql.connector.connect(user='root', password='lixiao187.', database='cloudmusic', charset='utf8')cursor = conn.cursor()for l in L:try:cursor.execute('insert into music(music_id, music_name, artist_id, artist_name, comments) values (%s, %s, %s, %s, %s)',l)conn.commit()except Exception as e:print(e)print(l)continuecursor.close()conn.close()except Exception as e:print(e)print(L)# 獲得某字母頁(yè)面上的歌手id列表 def get_artist(url):k = 0t = []while True:try:resp = request.urlopen(url)html = resp.read().decode('utf-8')soup = bs(html, 'html.parser')l = soup.find_all('a', class_='nm nm-icn f-thide s-fc0')p = r'\s*\/[a-z]+\?[a-z]+\=([0-9]+)'for i in l:t.append(re.match(p, i['href']).group(1))return texcept Exception as e:print(e)k += 1if k > 10:print('頁(yè)面' + url + '發(fā)生錯(cuò)誤')return Nonet = []continue# 獲得某歌手的熱門(mén)歌曲id列表 def get_song(artist_id):k = 0t = []while True:url = 'http://music.163.com/artist?id=' + artist_idtry:req = request.Request(url)req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4399.400 QQBrowser/9.7.12777.400')req.add_header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8')resp = request.urlopen(req)html = resp.read().decode('utf-8')soup = bs(html, 'html.parser')except Exception as e:k += 1if k > 10:print('歌手' + artist_id + '發(fā)生錯(cuò)誤')print(e)return Nonecontinuetry:a = soup.find('ul', class_='f-hide')l = a.childrenp = r'\s*\/[a-z]+\?[a-z]+\=([0-9]+)'for i in l:music_id = re.match(p, i.a['href']).group(1)data = (music_id, artist_id)t.append(data)return texcept Exception as e:print(e)print('歌手' + artist_id + '發(fā)生錯(cuò)誤')return None# 獲得全部所需信息 def get_data(music_id, artist_id):k = 0while True:try:comments = gct.get_comments_total(music_id)print('歌曲'+music_id+',評(píng)論數(shù):'+str(comments))if comments < 10000:return Noneurl = 'http://music.163.com/song?id=' + music_iddata = []req = request.Request(url)req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4399.400 QQBrowser/9.7.12777.400')req.add_header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8')resp = request.urlopen(req)html = resp.read().decode('utf-8')soup = bs(html, 'html.parser')d = soup.find('div', class_='tit')p = soup.find('p', class_='des s-fc4')s = soup.find('span', class_='j-flag')music_name = d.em.textartist_name = p.span['title']data.append(music_id)data.append(music_name)data.append(artist_id)data.append(artist_name)data.append(comments)return dataexcept Exception as e:k += 1if k > 10:print('歌曲' + music_id + '發(fā)生錯(cuò)誤')return Nonecontinue# 逐條寫(xiě)入 def get_and_write(artists, name):data = []for a in artists:songs = get_song(a)if songs == None:continuefor s in songs:d = get_data(s[0], a)if d == None:continuedata.append(d)if len(data) > 0:write(data)# 歌手子頁(yè)面爬取線程 def crawl(url, name):L = []artists = get_artist(url)if artists == None:returnfor a in artists:L.append(a)if len(L) > 9:t = threading.Thread(target=get_and_write, args=(L, ''))t.start()L = []t = threading.Thread(target=get_and_write, args=(L, ''))t.start()# 總方法 def threads_crawl(start, end):L = []for i in range(start - 1, end):t = threading.Thread(target=crawl, args=(urls[i], ''))L.append(t)for t in L:t.start()for t in L:t.join()threads_crawl(1, 405)

總結(jié)

以上是生活随笔為你收集整理的爬取网易云音乐评论过万歌曲的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。

主站蜘蛛池模板: 牛夜精品久久久久久久99黑人 | 国产毛片毛片毛片毛片毛片毛片 | 人善交videos欧美3d动漫 | 国产黄色高清 | 在线欧美日韩国产 | 久久精品亚洲精品 | 欧美亚洲国产日韩 | 精品日本一区二区三区在线观看 | 中文字幕一区二区三区四区视频 | av尤物| 啪啪中文字幕 | 欧美特黄色片 | 操干视频 | 日本不卡免费 | 国产区一区二区三区 | 先锋av在线资源 | 日本伦理中文字幕 | 黑人导航 | 在线成人黄色 | 影音先锋毛片 | 亚洲欧美校园春色 | 激情免费视频 | 亚洲AV无码一区二区三区性 | 丰满人妻熟妇乱偷人无码 | 中文字幕人妻丝袜乱一区三区 | 狠狠爱成人 | 无码无套少妇毛多18pxxxx | 邻居校草天天肉我h1v1 | 欧美一区二区三区不卡视频 | 三上悠亚痴汉电车 | 亚洲一区精品视频在线观看 | 熟妇人妻va精品中文字幕 | 一区二区三区日本视频 | 国产精品无码一区二区三区免费 | 天天色棕合合合合合合合 | 亚洲激情视频一区 | 玩弄丰满少妇xxxxx性多毛 | 精品国产污污免费网站入口 | 国模小丫大尺度啪啪人体 | 98久久久 | 欧美在线视频第一页 | 日本高清不卡码 | 韩日产理伦片在线观看 | 亚洲精品中文字幕 | 青青青在线视频免费观看 | 亚洲乱码国产乱码精品精软件 | 在线高清免费观看 | 国产精品成人在线 | 国产美女久久久 | 天堂伊人网 | 国产精品久久久久无码av色戒 | 国产精品丝袜视频 | 免费成人在线电影 | 99九九久久 | 午夜资源| 亚洲爽片 | 国产午夜精品一区二区三区视频 | 97国产精品视频 | 亚洲成人高清在线观看 | 天堂av在线免费观看 | 国产高清一区二区三区四区 | 欧美视频在线免费看 | 久久国产福利一区 | 日本午夜影院 | 久久白浆| 国产黄色片免费 | 色婷婷视频在线观看 | 国产不卡在线播放 | 欧美黑人粗大 | 美腿丝袜亚洲色图 | 美女让男人捅 | 一区二区三区四区在线播放 | 国产一区视频免费观看 | 亚洲系列在线观看 | 极品粉嫩小仙女高潮喷水久久 | 国产专区一区 | 美女午夜影院 | 免费萌白酱国产一区二区三区 | 亚洲欧美亚洲 | 欧美高清二区 | 朋友人妻少妇精品系列 | 日韩免费不卡视频 | 色就是色av| 狠狠干超碰| 一区二区三区精品视频 | 久久国产成人精品 | 国产欧美一区二区三区在线老狼 | 中文成人在线 | 在线观看免费av片 | 99re在线视频精品 | 亚洲av无码专区国产乱码不卡 | 婷婷开心激情网 | 欧美巨乳在线 | 免费一级肉体全黄毛片 | 人妻人人澡人人添人人爽 | 日本精品一二三 | 国产精品视频一二三区 | 西川结衣在线观看 | 孕妇毛片 |