當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

python基础--urllib

發(fā)布時間：2023/11/27 生活经验 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 python基础--urllib 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章目錄

urllib包
- 介紹
- urllib.request模塊
- urllib.error 模塊
- urllib.parse模塊
- 構建流程

urllib包

介紹

request：主要負責構造和發(fā)起網(wǎng)絡請求,定義了適用于在各種復雜情況下打開 URL (主要為 HTTP) 的函數(shù)和類
error：處理異常
parse：解析各種數(shù)據(jù)格式
robotparser：解析robots.txt文件

urllib.request模塊

url:表示進行操作的URL地址
data：要發(fā)送到服務器的數(shù)據(jù)
timeout：設置網(wǎng)站的訪問超時時間

urllib.request.urlopen(url,data=None,[timeout, ]*,cafile=None,capath=None,cadefault=False,context=None)

# 使用urlopen()方法在百度搜索關鍵詞中得第一頁連接
from urllib.request import urlopen  #導入python的內置模塊
from urllib.parse import urlencode  #導入python的內置模塊
import  re
##wd=input(輸入一個要搜索的關鍵字：’)
wd='www.toppr.net'  #初始化變量wd
wd=urlencode({'wd':wd})
url='http://www.baidu.com/s?'+wd
page=urlopen(url).read()
#定義變量content，對網(wǎng)頁進行編碼處理，并實現(xiàn)特殊字符處理
content=(page.decode('utf-8')).replace("\n","").replace("\t","")
title=re.findall(r'<h3 class="t".*?h3>',content)
#正則表達式處理
title=[item[item.find('href =')+6:item.find('target=')] for item in title]
title=[item.replace('',").replace("",") for item in title]
for item in title:   #遍歷titleprint(item)

urllib.error 模塊

在urllib中主要設置了兩個異常，一個是URLError，一個是HTTPError，HTTPError是URLError的子類。

HTTPError還包含了三個屬性：

code：請求的狀態(tài)碼
reason：錯誤的原因
headers：響應的報頭

from urllib.error import HTTPError
try:request.urlopen('https://www.baidu.com')
except HTTPError as e:print(e.code)

urllib.parse模塊

data參數(shù)需要用urllib.parse模塊對其進行數(shù)據(jù)格式處理。

urllib.parse.quote(url)：（URL編碼處理）主要對URL中的非ASCII碼編碼處理

urllib.parse.unquote(url)：（URL解碼處理）URL上的特殊字符還原

urllib.parse.urlencode：對請求數(shù)據(jù)data進行格式轉換

構建流程

from bs4 import BeautifulSoup  #網(wǎng)頁解析，獲取數(shù)據(jù)
import re  #正則表達式，進行文字匹配
import urllib.request,urllib.error  #定制URL，獲取網(wǎng)頁數(shù)據(jù)
import xlwt  #進行excel操作
import sqlite3 #進行SQLlite數(shù)據(jù)庫操作
def main():baseurl="https://movie.douban.com/top250?start="
# 1爬取數(shù)據(jù)datalist=getData(baseurl)savepath=".\\豆瓣電影TOP250.xls"saveData(savepath)# 爬取網(wǎng)頁
def getData(baseurl):datalist=[]# 2解析數(shù)據(jù)return datalist
# 3保存數(shù)據(jù)def saveData(savepath):if __name__=="__main__":#當程序執(zhí)行時print("***")

import urllib.request# 獲取get請求
from socket import timeout
from urllib import requestfrom networkx.release import urlresponce=urllib.request.urlopen("https://www.baidu.com")
print(responce.read().decode('utf-8'))#將讀取的文件用UTF-8來解析# 獲取一個post請求
responce=urllib.request.urlopen("http://httpbin.org/#/HTTP_Methods/post_post")
print(responce.read().decode('utf-8'))

總結

以上是生活随笔為你收集整理的python基础--urllib的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python基础---元组、字典、函数、
下一篇：爬虫入门