當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

爬虫实战学习笔记_4 网络请求urllib3模块：发送GET/POST请求实例+上传文件+IP代理+json+二进制+超时

發布時間：2024/7/5 编程问答 21 豆豆

生活随笔收集整理的這篇文章主要介紹了爬虫实战学习笔记_4 网络请求urllib3模块：发送GET/POST请求实例+上传文件+IP代理+json+二进制+超时小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1 urllib3模塊簡介

urllib3是一個第三方的網絡請求模塊（單獨安裝該模塊），在功能上比Python自帶的urllib強大。

1.1了解urllib3

urllib3庫功能強大，條理清晰的用于HTTP客戶端的python庫，提供了很多Python標準庫里所沒有的重要特性。例如：

線程安全。

連接池。

客戶端SSL/TⅡS驗證

使用multipart編碼上傳文件

Helpers用于重試請求并處理HTTP重定向.

支持gzip和deflate編碼

支持HTTP和SOCKS代理

100%的測試覆蓋率

1.1.1 urllib3安裝命令

pip install urllib3

2 發送網絡請求

2.1 發送Get請求

使用urllib3模塊發送網絡請求時，首先需要創建PoolManager對象，通過該對象調用request()方法來實現網絡請求的發送。

request()方法的語法格式如下。

request(method,url,fields=None,headers=None,**urlopen_kw)

method：必選參數，用于指定請求方式，如GET、POST、PUT等。
url：必選參數，用于設置需要請求的URL地址。
fields：可選參數，用于設置請求參數。
headers：可選參數，用于設置請求頭。

2.1.1 發送GET請求實例【并獲取響應信息】

import urllib3 urllib3.disable_warnings() # 關閉SSL警告 url = "https://www.baidu.com/" http = urllib3.PoolManager() get = http.request('GET',url) # 返回一個HTTPResponse對象 print(get.status) # 輸出 200response_header = get.info() # 獲取HTTPResponse對象中的info()獲取響應頭信息，字典形狀，需要用for循環 for key in response_header:print(key,":",response_header.get(key)) # Accept-Ranges : bytes # Cache-Control : no-cache # Connection : keep-alive # Content-Length : 227 # Content-Type : text/html # Date : Mon, 21 Mar 2022 12:12:23 GMT # P3p : CP=" OTI DSP COR IVA OUR IND COM ", CP=" OTI DSP COR IVA OUR IND COM " # Pragma : no-cache # Server : BWS/1.1 # Set-Cookie : BD_NOT_HTTPS=1; path=/; Max-Age=300, BIDUPSID=E864BF1D7795F2742A7BC13B95F89493; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com, PSTM=1647864743; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com, BAIDUID=E864BF1D7795F27482D1B67B4F266616:FG=1; max-age=31536000; expires=Tue, 21-Mar-23 12:12:23 GMT; domain=.baidu.com; path=/; version=1; comment=bd # Strict-Transport-Security : max-age=0 # Traceid : 1647864743283252404214760038623219429901 # X-Frame-Options : sameorigin # X-Ua-Compatible : IE=Edge,chrome=1

2.1.2 發送POST請求

import urllib3 url ="www.httpbin.org/post" params = {'name':'xiaoli','age':'1'} http = urllib3.PoolManager() post = http.request('POST',url,fields=params,retries=5) # retries重試次數：默認為3 print("返回結果：",post.data.decode('utf-8')) print("返回結果(含中文的情況下)：",post.data.decode('unicode_escape'))

2.2 處理服務器返回信息

2.2.1 處理服務器返回的json信息

如果服務器返回了一條JSON信息，而這條信息中只有某條數據為可用數據時，可以先將返JSON數據轉換為字典數據，按著直按獲取指定鍵所對應的值即可。

import urllib3 import json url ="www.httpbin.org/post" params = {'name':'xiaoli','age':'1'} http = urllib3.PoolManager() post = http.request('POST',url,fields=params,retries=5) # retries重試次數：默認為3 post_json_EN = json.loads(post.data.decode('utf-8')) post_json_CH = json.loads(post.data.decode('unicode_escape')) # 將響應數據轉換為字典類型 print("獲取name對應的數據",post_json_EN.get('form').get('name')) # 獲取name對應的數據 xiaoli

2.2.2 處理服務器返回的二進制數據（圖片）

import urllib3 urllib3.disable_warnings() url = 'https://img-blog.csdnimg.cn/2020060123063865.png' http = urllib3.PoolManager() get = http.request('GET',url) # 創建open對象 print(get.data) f = open('./p.png','wb+') f.write(get.data) # 寫入數據 f.close()

2.2.3 設置請求頭

import urllib3 urllib3.disable_warnings() url = 'https://www.baidu.com/' headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'} http = urllib3.PoolManager() get = http.request('GET',url,headers=headers) print(get.data.decode('utf-8'))

2.2.4 設置超時

import urllib3 # 導入urllib3模塊 urllib3.disable_warnings() # 關閉ssl警告 baidu_url = 'https://www.baidu.com/' # 百度超時請求測試地址 python_url = 'https://www.python.org/' # Python超時請求測試地址 http = urllib3.PoolManager() # 創建連接池管理對象 try:r = http.request('GET',baidu_url,timeout=0.01)# 發送GET請求，并設置超時時間為0.01秒 except Exception as error:print('百度超時：',error) # 百度超時： HTTPSConnectionPool(host='www.baidu.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002690D2057F0>, 'Connection to www.baidu.com timed out. (connect timeout=0.01)'))http2 = urllib3.PoolManager(timeout=0.1) # 創建連接池管理對象,并設置超時時間為0.1秒 try:r = http2.request('GET', python_url) # 發送GET請求 except Exception as error:print('Python超時：',error) # Python超時： HTTPSConnectionPool(host='www.python.org', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002690D21A910>, 'Connection to www.python.org timed out. (connect timeout=0.1)'))

2.2.5 設置IP代理

import urllib3 # 導入urllib3模塊 url = "http://httpbin.org/ip" # 代理IP請求測試地址 # 定義火狐瀏覽器請求頭信息 headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0'} # 創建代理管理對象 proxy = urllib3.ProxyManager('http://120.27.110.143:80',headers = headers) r = proxy.request('get',url,timeout=2.0) # 發送請求 print(r.data.decode()) # 打印返回結果

2.3 上傳

2.3.1 上傳文本

import urllib3 import json with open('./test.txt') as f :# 打開文本文件data = f.read() # 讀取文件 url = "http://httpbin.org/post" http = urllib3.PoolManager() post = http.request('POST',url,fields={'filedield':('upload.txt',data)}) files = json.loads(post.data.decode('utf-8'))['files'] # 獲取上傳文件內容 print(files) # 打印上傳文本信息 # {'filedield': '在學習中尋找快樂！'}

2.3.2 上傳圖片文件

import urllib3 with open('p.png','rb') as f :data = f.read() url = "http://httpbin.org/post" http = urllib3.PoolManager() # 發送上傳圖片文件請求 post = http.request('POST',url,body = data,headers={'Content-Type':'image/jpeg'}) print(post.data.decode())

總結

以上是生活随笔為你收集整理的爬虫实战学习笔记_4 网络请求urllib3模块：发送GET/POST请求实例+上传文件+IP代理+json+二进制+超时的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： oracle数据库延迟执行,如何诊断or
下一篇：【Pytorch神经网络理论篇】 22