生活随笔
收集整理的這篇文章主要介紹了
关于起点中文网月票字体解密(附赠翻页获得月票)
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
最近好久沒寫代碼了,突然想熱熱手于是就相中了起點中文網(●ˇ?ˇ●)
廢話不多說,獻上代碼
我們先來分析分析起點中文網的網站
https://www.qidian.com/rank/yuepiao/year2022-month01/
正常操作我們進入網站之后,按f12,點擊network ,如下圖
我們需要找到我們要爬取的內容,今天我們就爬取標題和月票數吧
**找到箭頭所指的網址點進去查看它的預覽(Preview)查找了之后發現,沒有我們要找的數據,我們再看是否在Response中,用CTRL+f來搜索星門會發現在這個里面
**
這樣我們就得到了題目,獲取題目的代碼如下
import random
import requests
from lxml
import etreeurl
= 'https://www.qidian.com/rank/yuepiao/year2022-month01/'headers
= {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36','referer': 'https://www.qidian.com/rank/','cookie': 'e1=%7B%22pid%22%3A%22qd_P_rank_01%22%2C%22eid%22%3A%22qd_C19%22%2C%22l1%22%3A4%7D; e2=%7B%22pid%22%3A%22qd_P_rank_01%22%2C%22eid%22%3A%22%22%2C%22l1%22%3A4%7D; _yep_uuid=fd95b6b7-090e-c6e5-cb8c-b8387e5b29ab; _ga=GA1.1.376581816.1643601078; newstatisticUUID=1643601078_1599172947; _csrfToken=m8mDkhtjc381bOHrIGiYTkE1g3bUzgPZjExmmO9l; _ga_FZMMH98S83=GS1.1.1643601077.1.1.1643601098.0; _ga_PFYW0QLV3P=GS1.1.1643601077.1.1.1643601098.0'}
response
= requests
.get
(url
, headers
=headers
)
response_text
= response
.text
html_data
= etree
.HTML
(response_text
)
title_list
= html_data
.xpath
('//h2/a/text()')
print(title_list
)
**運行代碼可以看到第一頁的小說名字都出來了(以列表的形式)
**
當然我們還要獲得這些小說的月票數
可以看出月票數沒有直接的顯示出來那我們先把這未顯示的拿到
re_data
= re
.findall
('</style><span class=".*?">(.*?)</span>', response_text
)
print(re_data
)
效果如下
可以看出這跟網頁上的顯示的不一樣啊,這是啥呀,于是可以猜想此月票數應該是進行了字體加密了為了驗證此想法在字體上找到了一個src
并且此src還是動態的(心態崩了)每次進入此網頁就會隨機生成以下是我在network的font進行對比
于是獲得動態字體url的代碼如下
font_url
= re
.findall
(r"format\('eot'\); src: url\('(.*?)'\) format\('woff'\)", response_text
)[0]
print(font_url
)
然后后面的思路就清晰了直接用獲得的字體包來解密源碼中加密的數據就行了
font_response
= requests
.get
(font_url
, headers
=headers
)
with open('jiemi.woff','wb')as f
:f
.write
(font_response
.content
)
font_obj
= TTFont
('jiemi.woff')
font_obj
.saveXML
('jiemi.xml')
cmap_dict
= font_obj
.getBestCmap
()
print("字體加密映射表", cmap_dict
)
for i
in enumerate(re_data
):new_font_list
= re
.findall
(r'\d+', i
[1])re_data
[i
[0]] = new_font_list
print("去掉特殊符號", re_data
)
dict_e_a
= {"one": '1', "two": '2', "three": '3', "four": '4', "five": "5", "six": '6', "seven": "7", "eight": '8', "nine": '9',"zero": '0'}
for i
in cmap_dict
:
for j
in dict_e_a
:if cmap_dict
[i
] == j
:cmap_dict
[i
] = dict_e_a
[j
]print("替換成數字后的關系映射表", cmap_dict
)
for i
in re_data
: print(i
) for j
in enumerate(i
): for k
in cmap_dict
: if j
[1] == str(k
):print(j
[0])i
[j
[0]] = cmap_dict
[k
]
print("解析之后的月票數", re_data
)
list_
= []
for i
in re_data
:j
= ''for k
in i
:j
+= klist_
.append
(j
)
print("最終的月票明文數據列表", list_
)
rank_dict
= {}
for i
in range(len(title_list
)):rank_dict
[title_list
[i
]] = list_
[i
]
這樣還不夠我有搞了個多頁,翻頁不是很難,就是這個解密不是很好搞觀察第一頁第二頁第三頁的url的不同
第一頁:https://www.qidian.com/rank/yuepiao/year2022-month01/
第二頁:https://www.qidian.com/rank/yuepiao/year2022-month01-page2/
第三頁:https://www.qidian.com/rank/yuepiao/year2022-month01-page3/
發現規律,完整翻頁代碼如下
import random
import requests
import time
from lxml
import etree
from fontTools
.ttLib
import TTFont
import repages
= int(input('請輸入要查詢的頁數'))
for page
in range(pages
):if page
== 0:url
= 'https://www.qidian.com/rank/yuepiao/year2022-month01/'else:pages_i
=1url
= f'https://www.qidian.com/rank/yuepiao/year2022-month01-page{pages_i+page}/'headers
= {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36','referer': 'https://www.qidian.com/rank/','cookie': 'e1=%7B%22pid%22%3A%22qd_P_rank_01%22%2C%22eid%22%3A%22qd_C19%22%2C%22l1%22%3A4%7D; e2=%7B%22pid%22%3A%22qd_P_rank_01%22%2C%22eid%22%3A%22%22%2C%22l1%22%3A4%7D; _yep_uuid=fd95b6b7-090e-c6e5-cb8c-b8387e5b29ab; _ga=GA1.1.376581816.1643601078; newstatisticUUID=1643601078_1599172947; _csrfToken=m8mDkhtjc381bOHrIGiYTkE1g3bUzgPZjExmmO9l; _ga_FZMMH98S83=GS1.1.1643601077.1.1.1643601098.0; _ga_PFYW0QLV3P=GS1.1.1643601077.1.1.1643601098.0'}response
= requests
.get
(url
, headers
=headers
)response_text
= response
.texthtml_data
= etree
.HTML
(response_text
)title_list
= html_data
.xpath
('//h2/a/text()')print(title_list
)re_data
= re
.findall
('</style><span class=".*?">(.*?)</span>', response_text
)print(re_data
)font_url
= re
.findall
(r"format\('eot'\); src: url\('(.*?)'\) format\('woff'\)", response_text
)[0]font_response
= requests
.get
(font_url
, headers
=headers
)with open('jiemi.woff','wb')as f
:f
.write
(font_response
.content
)font_obj
= TTFont
('jiemi.woff')font_obj
.saveXML
('jiemi.xml')cmap_dict
= font_obj
.getBestCmap
()print("字體加密映射表", cmap_dict
)for i
in enumerate(re_data
):new_font_list
= re
.findall
(r'\d+', i
[1])re_data
[i
[0]] = new_font_list
print("去掉特殊符號", re_data
)dict_e_a
= {"one": '1', "two": '2', "three": '3', "four": '4', "five": "5", "six": '6', "seven": "7", "eight": '8', "nine": '9',"zero": '0'}for i
in cmap_dict
:for j
in dict_e_a
:if cmap_dict
[i
] == j
:cmap_dict
[i
] = dict_e_a
[j
]print("替換成數字后的關系映射表", cmap_dict
)for i
in re_data
: print(i
) for j
in enumerate(i
): for k
in cmap_dict
: if j
[1] == str(k
):print(j
[0])i
[j
[0]] = cmap_dict
[k
]print("解析之后的月票數", re_data
)list_
= []for i
in re_data
:j
= ''for k
in i
:j
+= klist_
.append
(j
)print("最終的月票明文數據列表", list_
)rank_dict
= {}for i
in range(len(title_list
)):rank_dict
[title_list
[i
]] = list_
[i
]print(f"第{page+1}最終的結果", rank_dict
)print('-'*50)time
.sleep
(random
.randint
(1,2))
效果如下:
喜歡此文章的可以點在關注我,后續會發布更多好文章(●’?’●)
總結
以上是生活随笔為你收集整理的关于起点中文网月票字体解密(附赠翻页获得月票)的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。