python用法查询笔记_Python爬虫学习笔记(三)
handler處理器自定義 - Cookies && URLError && json簡(jiǎn)單使用
Cookies:
以抓取https://www.yaozh.com/為例
Test1(不使用cookies):
代碼:
import?urllib.request
#?1.添加URL
url?=?"https://www.yaozh.com/"#?2.添加請(qǐng)求頭
headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}
#?3.構(gòu)建請(qǐng)求對(duì)象
request?=?urllib.request.Request(url,?headers=headers)
#?4.發(fā)送請(qǐng)求對(duì)象
response?=?urllib.request.urlopen(request)
#?5.讀取數(shù)據(jù)
data?=?response.read()
#保存到文件中,驗(yàn)證數(shù)據(jù)
with?open('01cookies.html',?'wb')as?f:
f.write(data)
View Code
返回:
此時(shí)進(jìn)入頁(yè)面顯示為游客模式,即未登錄狀態(tài)。
Test2(使用cookies:手動(dòng)登錄):
在network中查找cookies部分
代碼(先登錄在抓取):
"""????直接獲取個(gè)人中心的頁(yè)面
手動(dòng)粘貼,復(fù)制抓包的cookies
放在?request請(qǐng)求對(duì)象的請(qǐng)求頭里面"""import?urllib.request
#?1.添加URL
url?=?"https://www.yaozh.com/"#?2.添加請(qǐng)求頭
headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36","Cookie":?"acw_tc=707c9fc316119925309487503e709498d3fe1f6beb4457b1cb1399958ad4d3;?PHPSESSID=bvc8utedu2sljbdb818m4va8q3;?_ga=GA1.2.472741825.1611992531;?_gid=GA1.2.2079712096.1611992531;?yaozh_logintime=1611992697;?yaozh_user=1038868%09s1mpL3;?yaozh_userId=1038868;?yaozh_jobstatus=kptta67UcJieW6zKnFSe2JyYnoaSZ5htnZqdg26qb21rg66flM6bh5%2BscZdyVNaWz9Gwl4Ny2G%2BenofNlKqpl6XKppZVnKmflWlxg2lolJabd519626986447e0E3cd918611D19BBEbmpaamm6HcNiemZtVq56lloN0pG2SaZ%2BGam2SaWucl5ianZiWbIdw4g%3D%3Da9295385d0680617486debd4ce304305;?_gat=1;?Hm_lpvt_65968db3ac154c3089d7f9a4cbb98c94=1611992698;?yaozh_uidhas=1;?yaozh_mylogin=1611992704;?acw_tc=707c9fc316119925309487503e709498d3fe1f6beb4457b1cb1399958ad4d3;?Hm_lvt_65968db3ac154c3089d7f9a4cbb98c94=1611992531%2C1611992638",
}
#?3.構(gòu)建請(qǐng)求對(duì)象
request?=?urllib.request.Request(url,?headers=headers)
#?4.發(fā)送請(qǐng)求對(duì)象
response?=?urllib.request.urlopen(request)
#?5.讀取數(shù)據(jù)
data?=?response.read()
#保存到文件中,驗(yàn)證數(shù)據(jù)
with?open('01cookies2.html',?'wb')as?f:
f.write(data)
先登錄再抓取
返回:
此時(shí)為登錄狀態(tài)s1mpL3。
Test3(使用cookies:代碼登錄):
準(zhǔn)備:
1.勾選Preserve Log,用于記錄上一次登錄
2.根據(jù)登錄時(shí)的數(shù)據(jù)報(bào),發(fā)現(xiàn)發(fā)送POST請(qǐng)求
3.登陸之后退出,進(jìn)入登錄頁(yè)面,檢察元素,查找表單各項(xiàng)數(shù)據(jù),
代碼:
"""????獲取個(gè)人頁(yè)面1.代碼登錄??登陸成功????cookie有效2.自動(dòng)帶著cookie?去請(qǐng)求個(gè)人中心
cookiejar:自動(dòng)保存cookie"""import?urllib.requestfrom?http?import?cookiejarfrom?urllib?import?parse
#?登陸之前,登錄頁(yè)的網(wǎng)址,https://www.yaozh.com/login,找登錄參數(shù)#?后臺(tái),根據(jù)發(fā)送的請(qǐng)求方式來(lái)判斷,如果是GET,返回登錄頁(yè)面,如果是POST,返回登錄結(jié)果
#???1.代碼登錄
#?1.1?登陸的網(wǎng)址
login_url?=?"https://www.yaozh.com/login"#?1.2?登陸的參數(shù)
login_form_data?=?{"?username":?"s1mpL3","pwd":?"***************",#個(gè)人隱私,代碼不予顯示"formhash":?"87F6F28A4*",#個(gè)人隱私,代碼不予顯示"backurl":?"https%3A%2F%2Fwww.yaozh.com%2F",
}
#?參數(shù)需要轉(zhuǎn)碼;POST請(qǐng)求的data要求是bytes樂行
login_str?=?urllib.parse.urlencode(login_form_data).encode('utf-8')
#?1.3?發(fā)送POST登錄請(qǐng)求
cookie_jar?=?cookiejar.CookieJar()
#?定義有添加cookie功能的處理器
cook_handler?=?urllib.request.HTTPCookieProcessor(cookie_jar)
#?根據(jù)處理器?生成openner
openner?=?urllib.request.build_opener(cook_handler)
#?帶著參數(shù),發(fā)送POST請(qǐng)求
#?添加請(qǐng)求頭
headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}
login_request?=?urllib.request.Request(login_url,?headers=headers,?data=login_str)
#?如果登陸成功,cookiejar自動(dòng)保存cookie
openner.open(login_request)
#???2.?代碼帶著cookie去訪問個(gè)人中心
center_url?=?"https://www.yaozh.com/member/"center_request?=?urllib.request.Request(center_url,?headers=headers)
response?=?openner.open(center_url)
#?bytes?-->?str
data?=?response.read().decode()
with?open('02cookies.html',?'w',?encoding="utf-8")as?f:
f.write(data)
代碼登錄
返回:
以s1mpL3用戶返回
注:
1.cookiejar庫(kù)的使用from?http?import?cookiejar
cookiejar.CookieJar()
2.HTTPCookieProcessor():有cookie功能的處理器
3.代碼登錄:只需修改用戶名和密碼
4.Python3報(bào)錯(cuò):
UnicodeEncodeError:?'gbk'?codec?can't?encode?character?'\xa0'?in?position?19523:?illegal?multibyte?sequence
修改:open()中添加encoding="utf-8"with?open('02cookies.html',?'w',?encoding="utf-8")as?f:
f.write(data)
解決方案參考:
URLError:urllib.request?提示錯(cuò)誤
分為URLError?HTTPError
其中HTTPError為URLError的子類
Test:
代碼1:import?urllib.request
url?=?'http://www.xiaojian.cn'?#?假設(shè)
response?=?urllib.request.urlopen(url)
返回1:
部分報(bào)錯(cuò):raise?URLError(err)
urllib.error.URLError:?
代碼2:import?urllib.request
url?=?'https://blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/1111'response?=?urllib.request.urlopen(url)
返回2:
部分報(bào)錯(cuò):raise?HTTPError(req.full_url,?code,?msg,?hdrs,?fp)
urllib.error.HTTPError:?HTTP?Error?404:?Not?Found
代碼3:import?urllib.request
url?=?'https://blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/1111'try:
response?=?urllib.request.urlopen(url)
except?urllib.request.HTTPError?as?error:
print(error.code)
except?urllib.request.URLError?as?error:
print(error)
返回3:
代碼4:import?urllib.request
url?=?'https://blog.cs1'try:
response?=?urllib.request.urlopen(url)
except?urllib.request.HTTPError?as?error:
print(error.code)
except?urllib.request.URLError?as?error:
print(error)
返回4:
Requsets:
準(zhǔn)備:
安裝第三方模塊:pip?install?requests
Test1(基本屬性:GET):
代碼1(不帶請(qǐng)求頭):
import?requests
url?=?"http://www.baidu.com"response?=?requests.get(url)
#?content屬性:返回類型是bytes
data?=?response.content
print(data)
data1?=?response.content.decode('utf-8')
print(type(data1))
#?text屬性:返回類型是文本str(如果響應(yīng)內(nèi)容沒有編碼,將自行編碼,可能出錯(cuò)。因此優(yōu)先使用content)
data2?=?response.text
print(type(data2))
View Code
返回1:
代碼2(帶請(qǐng)求頭):import?requestsclass?RequestSpider(object):
def?__init__(self):
url?=?"https://www.baidu.com/"headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}
self.response?=?requests.get(url,?headers=headers)
def?run(self):
data?=?self.response.content
#?1.獲取請(qǐng)求頭
request_headers1?=?self.response.request.headers
print(request_headers1)
#?2.獲取響應(yīng)頭
request_headers2?=?self.response.headers
print(request_headers2)
#?3.獲取響應(yīng)狀態(tài)碼
code?=?self.response.status_code
print(code)
#?4.獲取請(qǐng)求的cookie
request_cookie?=?self.response.request._cookies
print(request_cookie)
#注:用瀏覽器進(jìn)入百度時(shí),可能會(huì)有很多cookie,這是瀏覽器自動(dòng)添加的,不是服務(wù)器給的
#?5.獲取響應(yīng)的cookie
response_cookie?=?self.response.cookies
print(response_cookie)
RequestSpider().run()
返回:E:\python\python.exe?H:/code/Python爬蟲/Day04/03-requests_use2.py
{'User-Agent':?'Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36',?'Accept-Encoding':?'gzip,?deflate',?'Accept':?'*/*',?'Connection':?'keep-alive'}
{'Bdpagetype':?'1',?'Bdqid':?'0xe0b22322001a2c4a',?'Cache-Control':?'private',?'Connection':?'keep-alive',?'Content-Encoding':?'gzip',?'Content-Type':?'text/html;charset=utf-8',?'Date':?'Sat,?30?Jan?2021?09:27:06?GMT',?'Expires':?'Sat,?30?Jan?2021?09:26:56?GMT',?'P3p':?'CP="?OTI?DSP?COR?IVA?OUR?IND?COM?",?CP="?OTI?DSP?COR?IVA?OUR?IND?COM?"',?'Server':?'BWS/1.1',?'Set-Cookie':?'BAIDUID=E577CD647F2B1CA6A7C0F4112781CAF9:FG=1;?expires=Thu,?31-Dec-37?23:55:55?GMT;?max-age=2147483647;?path=/;?domain=.baidu.com,?BIDUPSID=E577CD647F2B1CA6A7C0F4112781CAF9;?expires=Thu,?31-Dec-37?23:55:55?GMT;?max-age=2147483647;?path=/;?domain=.baidu.com,?PSTM=1611998826;?expires=Thu,?31-Dec-37?23:55:55?GMT;?max-age=2147483647;?path=/;?domain=.baidu.com,?BAIDUID=E577CD647F2B1CA65749857950B007E4:FG=1;?max-age=31536000;?expires=Sun,?30-Jan-22?09:27:06?GMT;?domain=.baidu.com;?path=/;?version=1;?comment=bd,?BDSVRTM=0;?path=/,?BD_HOME=1;?path=/,?H_PS_PSSID=33423_33516_33402_33273_33590_26350_33568;?path=/;?domain=.baidu.com,?BAIDUID_BFESS=E577CD647F2B1CA6A7C0F4112781CAF9:FG=1;?Path=/;?Domain=baidu.com;?Expires=Thu,?31?Dec?2037?23:55:55?GMT;?Max-Age=2147483647;?Secure;?SameSite=None',?'Strict-Transport-Security':?'max-age=172800',?'Traceid':?'1611998826055672090616191042239287929930',?'X-Ua-Compatible':?'IE=Edge,chrome=1',?'Transfer-Encoding':?'chunked'}200
,?,?,?,?,?,?]>Process?finished?with?exit?code?0
Test2(URL自動(dòng)轉(zhuǎn)譯):
代碼1:
#?https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&tn=baidu&wd=%E7%88%AC%E8%99%AB&oq=%2526lt%253BcH0%2520-%2520Nu1L&rsv_pq=d38dc072002f5aef&rsv_t=62dcS%2BcocFsilJnL%2FcjmqGeUvo6S6XMFTiyfxi22AnqTbscZBf6K%2F13WW%2Bo&rqlang=cn&rsv_enter=1&rsv_dl=tb&rsv_sug3=4&rsv_sug1=3&rsv_sug7=100&rsv_sug2=0&rsv_btype=t&inputT=875&rsv_sug4=875#?https://www.baidu.com/s?wd=%E7%88%AC%E8%99%ABimport?requests
#?參數(shù)自動(dòng)轉(zhuǎn)譯
url?=?"http://www.baidu.com/s?wd=爬蟲"headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}
response?=?requests.get(url,?headers=headers)
data?=?response.content.decode()
with?open('baidu.html',?'w',?encoding="utf-8")as?f:
f.write(data)
漢字參數(shù)自動(dòng)轉(zhuǎn)譯
返回:
成功返回并生成文件,此時(shí)漢字作為參數(shù)實(shí)現(xiàn)了自動(dòng)轉(zhuǎn)譯。
代碼2:
#?https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&tn=baidu&wd=%E7%88%AC%E8%99%AB&oq=%2526lt%253BcH0%2520-%2520Nu1L&rsv_pq=d38dc072002f5aef&rsv_t=62dcS%2BcocFsilJnL%2FcjmqGeUvo6S6XMFTiyfxi22AnqTbscZBf6K%2F13WW%2Bo&rqlang=cn&rsv_enter=1&rsv_dl=tb&rsv_sug3=4&rsv_sug1=3&rsv_sug7=100&rsv_sug2=0&rsv_btype=t&inputT=875&rsv_sug4=875#?https://www.baidu.com/s?wd=%E7%88%AC%E8%99%ABimport?requests
#?參數(shù)自動(dòng)轉(zhuǎn)譯
url?=?"http://www.baidu.com/s"parmas?=?{'wd':?'爬蟲',
}
headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}
response?=?requests.get(url,?headers=headers,?params=parmas)
data?=?response.content.decode()
with?open('baidu1.html',?'w',?encoding="utf-8")as?f:
f.write(data)
字典自動(dòng)轉(zhuǎn)譯
返回:
成功返回并生成文件,此時(shí)字典作為參數(shù)實(shí)現(xiàn)了自動(dòng)轉(zhuǎn)譯。
注:
發(fā)送POST請(qǐng)求和添加參數(shù)requests.post(url,?data=(參數(shù){}),?json=(參數(shù)))
Test3(json):
代碼:#?https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&tn=baidu&wd=%E7%88%AC%E8%99%AB&oq=%2526lt%253BcH0%2520-%2520Nu1L&rsv_pq=d38dc072002f5aef&rsv_t=62dcS%2BcocFsilJnL%2FcjmqGeUvo6S6XMFTiyfxi22AnqTbscZBf6K%2F13WW%2Bo&rqlang=cn&rsv_enter=1&rsv_dl=tb&rsv_sug3=4&rsv_sug1=3&rsv_sug7=100&rsv_sug2=0&rsv_btype=t&inputT=875&rsv_sug4=875#?https://www.baidu.com/s?wd=%E7%88%AC%E8%99%ABimport?requests
import?json
url?=?"https://api.github.com/user"#這個(gè)網(wǎng)址返回的內(nèi)容不是HTML,而是標(biāo)準(zhǔn)的json
headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}
response?=?requests.get(url,?headers=headers)
#?str
data?=?response.content.decode()
print(data)
#?str?-->?dict
data_dict?=?json.loads(data)
print(data_dict["message"])
#?json()會(huì)自動(dòng)將json字符串轉(zhuǎn)換成Python?dict?list
data1?=?response.json()
print(data1)
print(type(data1))
print(data1["message"])
返回:E:\python\python.exe?H:/code/Python爬蟲/Day04/03-requests_use3.py
{??"message":?"Requires?authentication",??"documentation_url":?"https://docs.github.com/rest/reference/users#get-the-authenticated-user"}
Requires?authentication
{'message':?'Requires?authentication',?'documentation_url':?'https://docs.github.com/rest/reference/users#get-the-authenticated-user'}Requires?authentication
Process?finished?with?exit?code?0
總結(jié)
以上是生活随笔為你收集整理的python用法查询笔记_Python爬虫学习笔记(三)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: c语言星空程序,C语言实现动态星空
- 下一篇: python二进制文件 删除尾部数据_在