日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程语言 > python >内容正文

python

python用法查询笔记_Python爬虫学习笔记(三)

發(fā)布時(shí)間:2024/7/23 python 28 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python用法查询笔记_Python爬虫学习笔记(三) 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

handler處理器自定義 - Cookies && URLError && json簡(jiǎn)單使用

Cookies:

以抓取https://www.yaozh.com/為例

Test1(不使用cookies):

代碼:

import?urllib.request

#?1.添加URL

url?=?"https://www.yaozh.com/"#?2.添加請(qǐng)求頭

headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}

#?3.構(gòu)建請(qǐng)求對(duì)象

request?=?urllib.request.Request(url,?headers=headers)

#?4.發(fā)送請(qǐng)求對(duì)象

response?=?urllib.request.urlopen(request)

#?5.讀取數(shù)據(jù)

data?=?response.read()

#保存到文件中,驗(yàn)證數(shù)據(jù)

with?open('01cookies.html',?'wb')as?f:

f.write(data)

View Code

返回:

此時(shí)進(jìn)入頁(yè)面顯示為游客模式,即未登錄狀態(tài)。

Test2(使用cookies:手動(dòng)登錄):

在network中查找cookies部分

代碼(先登錄在抓取):

"""????直接獲取個(gè)人中心的頁(yè)面

手動(dòng)粘貼,復(fù)制抓包的cookies

放在?request請(qǐng)求對(duì)象的請(qǐng)求頭里面"""import?urllib.request

#?1.添加URL

url?=?"https://www.yaozh.com/"#?2.添加請(qǐng)求頭

headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36","Cookie":?"acw_tc=707c9fc316119925309487503e709498d3fe1f6beb4457b1cb1399958ad4d3;?PHPSESSID=bvc8utedu2sljbdb818m4va8q3;?_ga=GA1.2.472741825.1611992531;?_gid=GA1.2.2079712096.1611992531;?yaozh_logintime=1611992697;?yaozh_user=1038868%09s1mpL3;?yaozh_userId=1038868;?yaozh_jobstatus=kptta67UcJieW6zKnFSe2JyYnoaSZ5htnZqdg26qb21rg66flM6bh5%2BscZdyVNaWz9Gwl4Ny2G%2BenofNlKqpl6XKppZVnKmflWlxg2lolJabd519626986447e0E3cd918611D19BBEbmpaamm6HcNiemZtVq56lloN0pG2SaZ%2BGam2SaWucl5ianZiWbIdw4g%3D%3Da9295385d0680617486debd4ce304305;?_gat=1;?Hm_lpvt_65968db3ac154c3089d7f9a4cbb98c94=1611992698;?yaozh_uidhas=1;?yaozh_mylogin=1611992704;?acw_tc=707c9fc316119925309487503e709498d3fe1f6beb4457b1cb1399958ad4d3;?Hm_lvt_65968db3ac154c3089d7f9a4cbb98c94=1611992531%2C1611992638",

}

#?3.構(gòu)建請(qǐng)求對(duì)象

request?=?urllib.request.Request(url,?headers=headers)

#?4.發(fā)送請(qǐng)求對(duì)象

response?=?urllib.request.urlopen(request)

#?5.讀取數(shù)據(jù)

data?=?response.read()

#保存到文件中,驗(yàn)證數(shù)據(jù)

with?open('01cookies2.html',?'wb')as?f:

f.write(data)

先登錄再抓取

返回:

此時(shí)為登錄狀態(tài)s1mpL3。

Test3(使用cookies:代碼登錄):

準(zhǔn)備:

1.勾選Preserve Log,用于記錄上一次登錄

2.根據(jù)登錄時(shí)的數(shù)據(jù)報(bào),發(fā)現(xiàn)發(fā)送POST請(qǐng)求

3.登陸之后退出,進(jìn)入登錄頁(yè)面,檢察元素,查找表單各項(xiàng)數(shù)據(jù),

代碼:

"""????獲取個(gè)人頁(yè)面1.代碼登錄??登陸成功????cookie有效2.自動(dòng)帶著cookie?去請(qǐng)求個(gè)人中心

cookiejar:自動(dòng)保存cookie"""import?urllib.requestfrom?http?import?cookiejarfrom?urllib?import?parse

#?登陸之前,登錄頁(yè)的網(wǎng)址,https://www.yaozh.com/login,找登錄參數(shù)#?后臺(tái),根據(jù)發(fā)送的請(qǐng)求方式來(lái)判斷,如果是GET,返回登錄頁(yè)面,如果是POST,返回登錄結(jié)果

#???1.代碼登錄

#?1.1?登陸的網(wǎng)址

login_url?=?"https://www.yaozh.com/login"#?1.2?登陸的參數(shù)

login_form_data?=?{"?username":?"s1mpL3","pwd":?"***************",#個(gè)人隱私,代碼不予顯示"formhash":?"87F6F28A4*",#個(gè)人隱私,代碼不予顯示"backurl":?"https%3A%2F%2Fwww.yaozh.com%2F",

}

#?參數(shù)需要轉(zhuǎn)碼;POST請(qǐng)求的data要求是bytes樂行

login_str?=?urllib.parse.urlencode(login_form_data).encode('utf-8')

#?1.3?發(fā)送POST登錄請(qǐng)求

cookie_jar?=?cookiejar.CookieJar()

#?定義有添加cookie功能的處理器

cook_handler?=?urllib.request.HTTPCookieProcessor(cookie_jar)

#?根據(jù)處理器?生成openner

openner?=?urllib.request.build_opener(cook_handler)

#?帶著參數(shù),發(fā)送POST請(qǐng)求

#?添加請(qǐng)求頭

headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}

login_request?=?urllib.request.Request(login_url,?headers=headers,?data=login_str)

#?如果登陸成功,cookiejar自動(dòng)保存cookie

openner.open(login_request)

#???2.?代碼帶著cookie去訪問個(gè)人中心

center_url?=?"https://www.yaozh.com/member/"center_request?=?urllib.request.Request(center_url,?headers=headers)

response?=?openner.open(center_url)

#?bytes?-->?str

data?=?response.read().decode()

with?open('02cookies.html',?'w',?encoding="utf-8")as?f:

f.write(data)

代碼登錄

返回:

以s1mpL3用戶返回

注:

1.cookiejar庫(kù)的使用from?http?import?cookiejar

cookiejar.CookieJar()

2.HTTPCookieProcessor():有cookie功能的處理器

3.代碼登錄:只需修改用戶名和密碼

4.Python3報(bào)錯(cuò):

UnicodeEncodeError:?'gbk'?codec?can't?encode?character?'\xa0'?in?position?19523:?illegal?multibyte?sequence

修改:open()中添加encoding="utf-8"with?open('02cookies.html',?'w',?encoding="utf-8")as?f:

f.write(data)

解決方案參考:

URLError:urllib.request?提示錯(cuò)誤

分為URLError?HTTPError

其中HTTPError為URLError的子類

Test:

代碼1:import?urllib.request

url?=?'http://www.xiaojian.cn'?#?假設(shè)

response?=?urllib.request.urlopen(url)

返回1:

部分報(bào)錯(cuò):raise?URLError(err)

urllib.error.URLError:?

代碼2:import?urllib.request

url?=?'https://blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/1111'response?=?urllib.request.urlopen(url)

返回2:

部分報(bào)錯(cuò):raise?HTTPError(req.full_url,?code,?msg,?hdrs,?fp)

urllib.error.HTTPError:?HTTP?Error?404:?Not?Found

代碼3:import?urllib.request

url?=?'https://blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/1111'try:

response?=?urllib.request.urlopen(url)

except?urllib.request.HTTPError?as?error:

print(error.code)

except?urllib.request.URLError?as?error:

print(error)

返回3:

代碼4:import?urllib.request

url?=?'https://blog.cs1'try:

response?=?urllib.request.urlopen(url)

except?urllib.request.HTTPError?as?error:

print(error.code)

except?urllib.request.URLError?as?error:

print(error)

返回4:

Requsets:

準(zhǔn)備:

安裝第三方模塊:pip?install?requests

Test1(基本屬性:GET):

代碼1(不帶請(qǐng)求頭):

import?requests

url?=?"http://www.baidu.com"response?=?requests.get(url)

#?content屬性:返回類型是bytes

data?=?response.content

print(data)

data1?=?response.content.decode('utf-8')

print(type(data1))

#?text屬性:返回類型是文本str(如果響應(yīng)內(nèi)容沒有編碼,將自行編碼,可能出錯(cuò)。因此優(yōu)先使用content)

data2?=?response.text

print(type(data2))

View Code

返回1:

代碼2(帶請(qǐng)求頭):import?requestsclass?RequestSpider(object):

def?__init__(self):

url?=?"https://www.baidu.com/"headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}

self.response?=?requests.get(url,?headers=headers)

def?run(self):

data?=?self.response.content

#?1.獲取請(qǐng)求頭

request_headers1?=?self.response.request.headers

print(request_headers1)

#?2.獲取響應(yīng)頭

request_headers2?=?self.response.headers

print(request_headers2)

#?3.獲取響應(yīng)狀態(tài)碼

code?=?self.response.status_code

print(code)

#?4.獲取請(qǐng)求的cookie

request_cookie?=?self.response.request._cookies

print(request_cookie)

#注:用瀏覽器進(jìn)入百度時(shí),可能會(huì)有很多cookie,這是瀏覽器自動(dòng)添加的,不是服務(wù)器給的

#?5.獲取響應(yīng)的cookie

response_cookie?=?self.response.cookies

print(response_cookie)

RequestSpider().run()

返回:E:\python\python.exe?H:/code/Python爬蟲/Day04/03-requests_use2.py

{'User-Agent':?'Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36',?'Accept-Encoding':?'gzip,?deflate',?'Accept':?'*/*',?'Connection':?'keep-alive'}

{'Bdpagetype':?'1',?'Bdqid':?'0xe0b22322001a2c4a',?'Cache-Control':?'private',?'Connection':?'keep-alive',?'Content-Encoding':?'gzip',?'Content-Type':?'text/html;charset=utf-8',?'Date':?'Sat,?30?Jan?2021?09:27:06?GMT',?'Expires':?'Sat,?30?Jan?2021?09:26:56?GMT',?'P3p':?'CP="?OTI?DSP?COR?IVA?OUR?IND?COM?",?CP="?OTI?DSP?COR?IVA?OUR?IND?COM?"',?'Server':?'BWS/1.1',?'Set-Cookie':?'BAIDUID=E577CD647F2B1CA6A7C0F4112781CAF9:FG=1;?expires=Thu,?31-Dec-37?23:55:55?GMT;?max-age=2147483647;?path=/;?domain=.baidu.com,?BIDUPSID=E577CD647F2B1CA6A7C0F4112781CAF9;?expires=Thu,?31-Dec-37?23:55:55?GMT;?max-age=2147483647;?path=/;?domain=.baidu.com,?PSTM=1611998826;?expires=Thu,?31-Dec-37?23:55:55?GMT;?max-age=2147483647;?path=/;?domain=.baidu.com,?BAIDUID=E577CD647F2B1CA65749857950B007E4:FG=1;?max-age=31536000;?expires=Sun,?30-Jan-22?09:27:06?GMT;?domain=.baidu.com;?path=/;?version=1;?comment=bd,?BDSVRTM=0;?path=/,?BD_HOME=1;?path=/,?H_PS_PSSID=33423_33516_33402_33273_33590_26350_33568;?path=/;?domain=.baidu.com,?BAIDUID_BFESS=E577CD647F2B1CA6A7C0F4112781CAF9:FG=1;?Path=/;?Domain=baidu.com;?Expires=Thu,?31?Dec?2037?23:55:55?GMT;?Max-Age=2147483647;?Secure;?SameSite=None',?'Strict-Transport-Security':?'max-age=172800',?'Traceid':?'1611998826055672090616191042239287929930',?'X-Ua-Compatible':?'IE=Edge,chrome=1',?'Transfer-Encoding':?'chunked'}200

,?,?,?,?,?,?]>Process?finished?with?exit?code?0

Test2(URL自動(dòng)轉(zhuǎn)譯):

代碼1:

#?https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&tn=baidu&wd=%E7%88%AC%E8%99%AB&oq=%2526lt%253BcH0%2520-%2520Nu1L&rsv_pq=d38dc072002f5aef&rsv_t=62dcS%2BcocFsilJnL%2FcjmqGeUvo6S6XMFTiyfxi22AnqTbscZBf6K%2F13WW%2Bo&rqlang=cn&rsv_enter=1&rsv_dl=tb&rsv_sug3=4&rsv_sug1=3&rsv_sug7=100&rsv_sug2=0&rsv_btype=t&inputT=875&rsv_sug4=875#?https://www.baidu.com/s?wd=%E7%88%AC%E8%99%ABimport?requests

#?參數(shù)自動(dòng)轉(zhuǎn)譯

url?=?"http://www.baidu.com/s?wd=爬蟲"headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}

response?=?requests.get(url,?headers=headers)

data?=?response.content.decode()

with?open('baidu.html',?'w',?encoding="utf-8")as?f:

f.write(data)

漢字參數(shù)自動(dòng)轉(zhuǎn)譯

返回:

成功返回并生成文件,此時(shí)漢字作為參數(shù)實(shí)現(xiàn)了自動(dòng)轉(zhuǎn)譯。

代碼2:

#?https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&tn=baidu&wd=%E7%88%AC%E8%99%AB&oq=%2526lt%253BcH0%2520-%2520Nu1L&rsv_pq=d38dc072002f5aef&rsv_t=62dcS%2BcocFsilJnL%2FcjmqGeUvo6S6XMFTiyfxi22AnqTbscZBf6K%2F13WW%2Bo&rqlang=cn&rsv_enter=1&rsv_dl=tb&rsv_sug3=4&rsv_sug1=3&rsv_sug7=100&rsv_sug2=0&rsv_btype=t&inputT=875&rsv_sug4=875#?https://www.baidu.com/s?wd=%E7%88%AC%E8%99%ABimport?requests

#?參數(shù)自動(dòng)轉(zhuǎn)譯

url?=?"http://www.baidu.com/s"parmas?=?{'wd':?'爬蟲',

}

headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}

response?=?requests.get(url,?headers=headers,?params=parmas)

data?=?response.content.decode()

with?open('baidu1.html',?'w',?encoding="utf-8")as?f:

f.write(data)

字典自動(dòng)轉(zhuǎn)譯

返回:

成功返回并生成文件,此時(shí)字典作為參數(shù)實(shí)現(xiàn)了自動(dòng)轉(zhuǎn)譯。

注:

發(fā)送POST請(qǐng)求和添加參數(shù)requests.post(url,?data=(參數(shù){}),?json=(參數(shù)))

Test3(json):

代碼:#?https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&tn=baidu&wd=%E7%88%AC%E8%99%AB&oq=%2526lt%253BcH0%2520-%2520Nu1L&rsv_pq=d38dc072002f5aef&rsv_t=62dcS%2BcocFsilJnL%2FcjmqGeUvo6S6XMFTiyfxi22AnqTbscZBf6K%2F13WW%2Bo&rqlang=cn&rsv_enter=1&rsv_dl=tb&rsv_sug3=4&rsv_sug1=3&rsv_sug7=100&rsv_sug2=0&rsv_btype=t&inputT=875&rsv_sug4=875#?https://www.baidu.com/s?wd=%E7%88%AC%E8%99%ABimport?requests

import?json

url?=?"https://api.github.com/user"#這個(gè)網(wǎng)址返回的內(nèi)容不是HTML,而是標(biāo)準(zhǔn)的json

headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}

response?=?requests.get(url,?headers=headers)

#?str

data?=?response.content.decode()

print(data)

#?str?-->?dict

data_dict?=?json.loads(data)

print(data_dict["message"])

#?json()會(huì)自動(dòng)將json字符串轉(zhuǎn)換成Python?dict?list

data1?=?response.json()

print(data1)

print(type(data1))

print(data1["message"])

返回:E:\python\python.exe?H:/code/Python爬蟲/Day04/03-requests_use3.py

{??"message":?"Requires?authentication",??"documentation_url":?"https://docs.github.com/rest/reference/users#get-the-authenticated-user"}

Requires?authentication

{'message':?'Requires?authentication',?'documentation_url':?'https://docs.github.com/rest/reference/users#get-the-authenticated-user'}Requires?authentication

Process?finished?with?exit?code?0

總結(jié)

以上是生活随笔為你收集整理的python用法查询笔记_Python爬虫学习笔记(三)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。