日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

百度指数爬取+pyppeteer登录(解决旋转验证码)

發(fā)布時間:2023/12/16 编程问答 24 豆豆
生活随笔 收集整理的這篇文章主要介紹了 百度指数爬取+pyppeteer登录(解决旋转验证码) 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

?

百度指數(shù)中這些折線上的點是是通過兩個字符串加密過的

其中,數(shù)據(jù)接口會返回一個data值作為e值,和一個uniqid用作去請求t值

當?shù)玫竭@兩個之后會進行一個處理函數(shù)decrypt

通過帶入t和e到decrypt測試,就是我們想要的,python版如下

def decrypt_py(t,e):""":param t::param e::return: 解析出來的數(shù)據(jù)"""a=dict()length=int(len(t)/2)for o in range(length):a[t[o]] = t[length + o]r="".join([a[each]for each in e ]).split(",")return r

對于省份和城市的名字是存在一個字典中來調(diào)用的

#baidu_id.py city={1:"濟南",2:"貴陽",3:"黔南",4:"六盤水",5:"南昌",6:"九江",7:"鷹潭",8:"撫州",9:"上饒",10:"贛州",11:"重慶",13:"包頭",14:"鄂爾多斯",15:"巴彥淖爾",16:"烏海",17:"阿拉善盟",19:"錫林郭勒盟",20:"呼和浩特",21:"赤峰",22:"通遼",25:"呼倫貝爾",28:"武漢",29:"大連",30:"黃石",31:"荊州",32:"襄陽",33:"黃岡",34:"荊門",35:"宜昌",36:"十堰",37:"隨州",38:"恩施",39:"鄂州",40:"咸寧",41:"孝感",42:"仙桃",43:"長沙",44:"岳陽",45:"衡陽",46:"株洲",47:"湘潭",48:"益陽",49:"郴州",50:"福州",51:"莆田",52:"三明",53:"龍巖",54:"廈門",55:"泉州",56:"漳州",57:"上海",59:"遵義",61:"黔東南",65:"湘西",66:"婁底",67:"懷化",68:"常德",73:"天門",74:"潛江",76:"濱州",77:"青島",78:"煙臺",79:"臨沂",80:"濰坊",81:"淄博",82:"東營",83:"聊城",84:"菏澤",85:"棗莊",86:"德州",87:"寧德",88:"威海",89:"柳州",90:"南寧",91:"桂林",92:"賀州",93:"貴港",94:"深圳",95:"廣州",96:"宜賓",97:"成都",98:"綿陽",99:"廣元",100:"遂寧",101:"巴中",102:"內(nèi)江",103:"瀘州",104:"南充",106:"德陽",107:"樂山",108:"廣安",109:"資陽",111:"自貢",112:"攀枝花",113:"達州",114:"雅安",115:"吉安",117:"昆明",118:"玉林",119:"河池",123:"玉溪",124:"楚雄",125:"南京",126:"蘇州",127:"無錫",128:"北海",129:"欽州",130:"防城港",131:"百色",132:"梧州",133:"東莞",134:"麗水",135:"金華",136:"萍鄉(xiāng)",137:"景德鎮(zhèn)",138:"杭州",139:"西寧",140:"銀川",141:"石家莊",143:"衡水",144:"張家口",145:"承德",146:"秦皇島",147:"廊坊",148:"滄州",149:"溫州",150:"沈陽",151:"盤錦",152:"哈爾濱",153:"大慶",154:"長春",155:"四平",156:"連云港",157:"淮安",158:"揚州",159:"泰州",160:"鹽城",161:"徐州",162:"常州",163:"南通",164:"天津",165:"西安",166:"蘭州",168:"鄭州",169:"鎮(zhèn)江",172:"宿遷",173:"銅陵",174:"黃山",175:"池州",176:"宣城",177:"巢湖",178:"淮南",179:"宿州",181:"六安",182:"滁州",183:"淮北",184:"阜陽",185:"馬鞍山",186:"安慶",187:"蚌埠",188:"蕪湖",189:"合肥",191:"遼源",194:"松原",195:"云浮",196:"佛山",197:"湛江",198:"江門",199:"惠州",200:"珠海",201:"韶關",202:"陽江",203:"茂名",204:"潮州",205:"揭陽",207:"中山",208:"清遠",209:"肇慶",210:"河源",211:"梅州",212:"汕頭",213:"汕尾",215:"鞍山",216:"朝陽",217:"錦州",218:"鐵嶺",219:"丹東",220:"本溪",221:"營口",222:"撫順",223:"阜新",224:"遼陽",225:"葫蘆島",226:"張家界",227:"大同",228:"長治",229:"忻州",230:"晉中",231:"太原",232:"臨汾",233:"運城",234:"晉城",235:"朔州",236:"陽泉",237:"呂梁",239:"海口",241:"萬寧",242:"瓊海",243:"三亞",244:"儋州",246:"新余",253:"南平",256:"宜春",259:"保定",261:"唐山",262:"南陽",263:"新鄉(xiāng)",264:"開封",265:"焦作",266:"平頂山",268:"許昌",269:"永州",270:"吉林",271:"銅川",272:"安康",273:"寶雞",274:"商洛",275:"渭南",276:"漢中",277:"咸陽",278:"榆林",280:"石河子",281:"慶陽",282:"定西",283:"武威",284:"酒泉",285:"張掖",286:"嘉峪關",287:"臺州",288:"衢州",289:"寧波",291:"眉山",292:"邯鄲",293:"邢臺",295:"伊春",297:"大興安嶺",300:"黑河",301:"鶴崗",302:"七臺河",303:"紹興",304:"嘉興",305:"湖州",306:"舟山",307:"平?jīng)?#34;,308:"天水",309:"白銀",310:"吐魯番",311:"昌吉",312:"哈密",315:"阿克蘇",317:"克拉瑪依",318:"博爾塔拉",319:"齊齊哈爾",320:"佳木斯",322:"牡丹江",323:"雞西",324:"綏化",331:"烏蘭察布",333:"興安盟",334:"大理",335:"昭通",337:"紅河",339:"曲靖",342:"麗江",343:"金昌",344:"隴南",346:"臨夏",350:"臨滄",352:"濟寧",353:"泰安",356:"萊蕪",359:"雙鴨山",366:"日照",370:"安陽",371:"駐馬店",373:"信陽",374:"鶴壁",375:"周口",376:"商丘",378:"洛陽",379:"漯河",380:"濮陽",381:"三門峽",383:"阿勒泰",384:"喀什",386:"和田",391:"亳州",395:"吳忠",396:"固原",401:"延安",405:"邵陽",407:"通化",408:"白山",410:"白城",417:"甘孜",422:"銅仁",424:"安順",426:"畢節(jié)",437:"文山",438:"保山",456:"東方",457:"阿壩",466:"拉薩",467:"烏魯木齊",472:"石嘴山",479:"涼山",480:"中衛(wèi)",499:"巴音郭楞",506:"來賓",514:"北京",516:"日喀則",520:"伊犁",525:"延邊",563:"塔城",582:"五指山",588:"黔西南",608:"海西",652:"海東",653:"克孜勒蘇柯爾克孜",654:"天門仙桃",655:"那曲",656:"林芝",657:"None",658:"防城",659:"玉樹",660:"伊犁哈薩克",661:"五家渠",662:"思茅",663:"香港",664:"澳門",665:"崇左",666:"普洱",667:"濟源",668:"西雙版納",669:"德宏",670:"文昌",671:"怒江",672:"迪慶",673:"甘南",674:"陵水黎族自治縣",675:"澄邁縣",676:"海南",677:"山南",678:"昌都",679:"樂東黎族自治縣",680:"臨高縣",681:"定安縣",682:"海北",683:"昌江黎族自治縣",684:"屯昌縣",685:"黃南",686:"保亭黎族苗族自治縣",687:"神農(nóng)架",688:"果洛",689:"白沙黎族自治縣",690:"瓊中黎族苗族自治縣",691:"阿里",692:"阿拉爾",693:"圖木舒克"} province={901:"山東",902:"貴州",903:"江西",904:"重慶",905:"內(nèi)蒙古",906:"湖北",907:"遼寧",908:"湖南",909:"福建",910:"上海",911:"北京",912:"廣西",913:"廣東",914:"四川",915:"云南",916:"江蘇",917:"浙江",918:"青海",919:"寧夏",920:"河北",921:"黑龍江",922:"吉林",923:"天津",924:"陜西",925:"甘肅",926:"新疆",927:"河南",928:"安徽",929:"山西",930:"海南",931:"臺灣",932:"西藏",933:"香港",934:"澳門"}

這個省份和城市可以通過js文件獲取,點擊人群畫像時候在network中搜索一個地名,會查到一個js文件,點進去之后再次進行查詢,就有好多好多城市了。

然后就可以動手了,需要登錄一下取到游覽器中的cookie,?

已經(jīng)根據(jù)接口更改修改了,5.20

然后看見有的小伙伴看見接口變了就不知道怎么做,推薦一個編碼轉(zhuǎn)換的網(wǎng)站,可以把它先解碼,就會容易得多

http://tool.chinaz.com/tools/urlencode.aspx

import requests import datetime from utils.baidu_id import province, citydef getIndex(word="我和我的祖國"):"""搜索指數(shù):param word::return:"""insert_word = """[[{"name":"%s","wordType":1}]]""" % wordurl = f"http://index.baidu.com/api/SearchApi/index?word={insert_word}&area=0&days=30"rep_json = get_rep_json(url)generalRatio = rep_json['data']['generalRatio']uniqid = rep_json['data']['uniqid']all_index_e = rep_json['data']['userIndexes'][0]['all']['data']pc_index_e = rep_json['data']['userIndexes'][0]['pc']['data']wise_index_e = rep_json['data']['userIndexes'][0]['wise']['data']t = getPtbk(uniqid)startDate = rep_json['data']['userIndexes'][0]['wise']['startDate']all_news = getTopNews(decrypt_py(t, all_index_e), startDate, word)pc_news = getTopNews(decrypt_py(t, pc_index_e), startDate, word)wise_news = getTopNews(decrypt_py(t, wise_index_e), startDate, word)for each in (all_news, pc_news, wise_news):print(each)return Nonedef getFeedIndex(word="我和我的祖國"):""":param word: 關鍵詞:return: 資訊指數(shù)"""insert_word="""[[{"name":"%s","wordType":1}]]"""%wordurl = "http://index.baidu.com/api/FeedSearchApi/getFeedIndex?word=%s&area=0&days=30" % insert_wordfeed_index_data = get_rep_json(url)uniqid = feed_index_data['data']['uniqid']data = feed_index_data["data"]['index'][0]generalRatio = data['generalRatio'] # 資訊指數(shù)概覽e = data['data']t = getPtbk(uniqid)return decrypt_py(t, e)def getNewsDate(word="我和我的祖國"):""":param word::return: 媒體指數(shù)的峰頂新聞"""insert_word = """[[{"name":"%s","wordType":1}]]""" % wordurl = f"http://index.baidu.com/api/NewsApi/getNewsIndex?area=0&word={insert_word}&days=30"res_json = get_rep_json(url)['data']generalRatio = res_json["index"][0]['generalRatio']e = res_json['index'][0]['data']start_date = res_json['index'][0]['startDate']t = getPtbk(res_json['uniqid'])news = getTopNews(decrypt_py(t, e), start_date, word)return newsdef getTopNews(numList: list, start_date, word):"""找到當前指數(shù)列表中的峰值轉(zhuǎn)換成日期字符串將合成的日期字符串帶入到請求數(shù)據(jù)接口中返回新聞數(shù)據(jù):param numList: 指數(shù)列表:param start_date: 起始日期:param word::return: 峰值新聞"""start_date = string_toDatetime(start_date)hill_tops = getHilltop(numList)hill_tops_date = [datetime_toString(start_date + datetime.timedelta(days=index)) for index in hill_tops]news = getNews(",".join(hill_tops_date), word)["data"][word]return newsdef getNews(dts, word):"""獲取媒體指數(shù)接口數(shù)據(jù):param dts:用,連接的時間字符串,例:dts=2019-10-06,2019-10-10,2019-10-12,2019-10-16,2019-10-21,2019-10-24:param word::return:接口傳回的數(shù)據(jù)"""url = f"http://index.baidu.com/api/NewsApi/checkNewsIndex?dates[]={dts}&type=day&words={word}"return get_rep_json(url)def getHilltop(numList: list):""":param numList:一組數(shù)值數(shù)組:return:峰值的序號列表"""numList = list(map(lambda x: float(x) if x else 0, numList))hillTops = [index for index, each in enumerate(numList) ifindex and index < len(numList) - 1 and each > numList[index - 1] and each > numList[index + 1]]return hillTopsdef getMulti(word="我和我的祖國"):"""需求圖譜pv搜索熱度;ratio搜索變化率;sim相關性"""url = f"http://index.baidu.com/api/WordGraph/multi?wordlist%5B%5D={word}"word_data = get_rep_json(url)['data']['wordlist'][0]if word_data['keyword']:print(word_data['wordGraph'])def getRegion(word="我和我的祖國", startDate='2019-09-17', endDate='2019-10-17'):"""地域分布"""url = f"http://index.baidu.com/api/SearchApi/region?region=0&word={word}&startDate={startDate}&endDate={endDate}"region = get_rep_json(url)['data']['region'][0]region_city = [{'city': city[int(city_n)], 'number': region['city'][city_n]} for city_n in region['city']]region_prov = [{'prov': province[int(prov_n)], 'number': region['prov'][prov_n]} for prov_n in region['prov']]print(region_city, region_prov)def getBaseAttributes(word="我和我的祖國"):"""人群屬性"""url = f"http://index.baidu.com/api/SocialApi/baseAttributes?wordlist[]={word}"rep_data = get_rep_json(url)['data']['result']return rep_datadef getInterest(word="我和我的祖國"):"""興趣分布"""url = f"http://index.baidu.com/api/SocialApi/interest?wordlist[]={word}"rep_data = rep_data = get_rep_json(url)['data']['result']return rep_datadef string_toDatetime(string):# 把字符串轉(zhuǎn)成datetimereturn datetime.datetime.strptime(string, "%Y-%m-%d")def datetime_toString(dt):# 把datetime轉(zhuǎn)成字符串return dt.strftime("%Y-%m-%d")def getPtbk(uniqid):url = f"http://index.baidu.com/Interface/ptbk?uniqid={uniqid}"return get_rep_json(url)['data']def decrypt_py(t, e):""":param t::param e::return: 解析出來的數(shù)據(jù)"""a = dict()length = int(len(t) / 2)for o in range(length):a[t[o]] = t[length + o]r = "".join([a[each] for each in e]).split(",")print(r)return rdef get_rep_json(url):"""獲取json:param url: 請求接口:return:"""hearder = {"Cookie": '', # 請?zhí)顚懹斡[器中的cookie"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36"}response = requests.get(url, headers=hearder)response_data = response.json()# print(response_data)return response_datadef main():getFeedIndex()getNewsDate()getIndex()getRegion()getBaseAttributes()getInterest()if __name__ == "__main__":main()

還有主題的,接口找了一下,東西都一樣,有興趣可以自己搞一下?

主題:主題搜索指數(shù):http://insight.baidu.com/base/search/trend/general?id=23734&dateType=30&filterType=1&source=0&filterType=1&source=1#pc&source=2#移動主題資訊和主題視頻"http://index.baidu.com/Interface/Newwordgraph/getTopicFeed?nodeid=23935";"http://index.baidu.com/Interface/api/ptbkTopic?uniqid=5dad242a566a46.43359139";;;;"/api/videoIndex/getVideoIndex?nodeid=23935";"http://index.baidu.com/Interface/api/ptbkTopic?uniqid=5dad242a71d612.53363283"品牌關注http://insight.baidu.com/base/search/topic/attentionBrand?id=23734搜索地域分布:http://insight.baidu.com/base/search/region/general?id=23734&dateType=30&filterType=1&pageSize=40人群屬性:http://insight.baidu.com/base/search/Topic/baseAttributes?nodeid=23734興趣分布:http://insight.baidu.com/base/search/Topic/interest?nodeid=23734&typeid=

模擬登錄完成旋轉(zhuǎn)驗證碼

現(xiàn)在的我已經(jīng)不是從前的我了,現(xiàn)在的我已經(jīng)可以完成它了。

世界上沒有爬不過去的山,如果有,那么可以站在巨人的肩膀上,再爬一次。

我來了我來了,我?guī)еP妥邅砹?#xff0c;同學們你們是否還在為旋轉(zhuǎn)驗證碼而苦惱,從現(xiàn)在開始你可以換個苦惱的問題了!!!

來來來,看成果

怎么樣,是不是很快樂,因為這篇篇幅已經(jīng)挺長不夠我輸出彩虹屁了,所以我寫到另外一篇博客了

旋轉(zhuǎn)拖動驗證碼解決方案

有什么不對的地方還是希望同學們能指出來!好嘞,快樂就完事了!

總結(jié)

以上是生活随笔為你收集整理的百度指数爬取+pyppeteer登录(解决旋转验证码)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。