當前位置：首頁 > 编程语言 > python >内容正文

python

【Python爬虫】使用urllib.request下载已知链接的网络资源

發布時間：2025/3/15 python 32 豆豆

生活随笔收集整理的這篇文章主要介紹了【Python爬虫】使用urllib.request下载已知链接的网络资源小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

如果有這樣一個場景，我們的EXCEL某一列記錄了好多（圖片、視頻、音頻）鏈接A，另外一列記錄了鏈接名稱B，現在我們想要自動下載這些鏈接的文件，我們應該怎樣處理？
1.循環去excel取值,將A和B存入到一個二維列表中
2.根據鏈接后綴不同情況（.jpg,.mp4,mp3等）用urllib.request去下載內容

具體代碼如下：

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' #作者：cacho_37967865 #博客：https://blog.csdn.net/sinat_37967865 #文件：getFile.py #日期：2018-11-24 #備注：獲取excel文件中下載信息存入到列表，然后循環去取數據下載文件（mp4,mp3,jpg,pdf等） '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''import xlrd import urllib.request import osdef get_excel_cell(xlsFile,num,nrows):data = xlrd.open_workbook(xlsFile)table = data.sheets()[0]cellData = []# 獲取指定列數據for i in range(num, nrows): # 控制行數（開始i=num處理），（結束i=nrows不處理）row = []className = table.cell_value(i, 3) # 第4列課程名稱row.append(className)classUrl = table.cell_value(i, 4) # 第5列課程下載路徑row.append(classUrl)cellData.append(row)return cellDatadef get_video(folder,url,fileName,fileType):os.chdir(folder) # 切換到將要存放文件的目錄file = open(fileName + fileType, "wb") # 打開文件try:req = urllib.request.Request(url=url)req.add_header("User-Agent","Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36")video = urllib.request.urlopen(req, timeout=40)mp4 = video.read() # 將文件轉換為bytes格式file.write(mp4) # 文件寫入print(type(file),type(req),type(video),type(mp4))except Exception as f:print(str(f))file.close()if __name__ == '__main__':videoInfo = get_excel_cell('F:\PythonProject\Pacong\docs\yuyus185.xls',182,183)for i in range(len(videoInfo)):fileName = videoInfo[i][0]url = videoInfo[i][1]fileType = url[-4:] # 截取最后4位，可以判斷內容的類型（.jpg,.mp4,mp3等）print(fileName,fileType,url)get_video('F:\SoftwareTest',url,fileName,fileType)

總結

以上是生活随笔為你收集整理的【Python爬虫】使用urllib.request下载已知链接的网络资源的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： mysql odbc.ini_ODBC连
下一篇：【Python】time内置模块处理时间