當前位置：首頁 > 编程语言 > python >内容正文

python

python小甲鱼爬虫妹子_【Python学习日记】B站小甲鱼：爬虫

發布時間：2023/12/4 python 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 python小甲鱼爬虫妹子_【Python学习日记】B站小甲鱼：爬虫小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Web Spider

Python 如何訪問互聯網

URL + lib -->urllib

URL的一般格式為 protocol://hostname[:port] / /path /[;parameters][?query]#fragment，其中[]為可選項

URL由三部分組成

第一部分是協議

第二部分是存放資源的服務器的域名系統或IP地址（有時候要包含端口號，各種傳輸協議都有默認的端口號）

第三部分是資源的具體地址，如目錄或文件名

urllib是python的一個包

下面這個程序展示了獲取百度新聞頁面的網頁數據的程序

importurllib.request

response= urllib.request.urlopen('http://news.baidu.com/')

html=response.read()

html= html.decode('utf-8')print(html)

獲得的response是二進制的，所以需要通過utf-8解碼

練習　　從placekitten上保存一張貓貓的圖片

importurllib.request

response= urllib.request.urlopen('http://placekitten.com/g/500/600')

cat_img=response.read()

with open('cat_500_600.jpg','wb') as f:

f.write(cat_img)

首先urlopen的參數可以是一個字符串也可以是一個request 對象

因此代碼也可以寫作把Request實例化

importurllib.request

req= urllib.request.Request('http://placekitten.com/g/500/600')

response=urllib.request.urlopen(req)

cat_img=response.read()

with open('cat_500_600.jpg', 'wb') as f:

f.write(cat_img)

Python提交POST表單訪問有道翻譯

爬有道詞典，但是沒有成功，原因是有道翻譯添加了反爬機制salt和sign。

importurllib.requestimporturllib.parse

url1= 'http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'data= {'i': '你好!', 'type': 'AUTO', 'doctype': 'json', 'version': '2.1', 'keyfrom': 'fanyi.web', 'ue': 'UTF-8','typoresult': 'true'}

data= urllib.parse.urlencode(data).encode('utf-8') #把data編碼

response= urllib.request.urlopen(url1, data) #發出請求，得到相應

html = response.read().decode('utf-8') #read之后得到的是utf-8的格式，解碼成Unicode的形式

print(html)

Request 有一個heads的參數，heads的格式是字典

修改heads可以通過兩個方式修改

1.通過Request的headers參數修改

2.通過Request.add_header()方法修改

為了使爬蟲更像人類，可以通過

1.time來控制時間戳，限制單位時間內IP的訪問次數

import time

...

time.sleep(5)

2.代理

通過代理去訪問服務器

1.參數是一個字典{‘類型’ ： ‘代理ip：端口號’}

proxy_support = urllib.request.ProxyHandler({})

2.定制一個opener

opener = urllib.request.build_opener(proxy_support)

3.1.安裝opener

urllib.request.install_opener(opener)

3.2.調用opener

opener.open(url)

教程使用的網站現在都設置了復雜的反爬機制了，所以運行沒有成功。

importurllib.request

url= 'http://www.whatismyip.com.tw'proxy_support= urllib.request.ProxyHandler({'http': '221.122.91.66:80'})

opener=urllib.request.build_opener(proxy_support)

opener.addheaders= {'User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)'

'Chrome/84.0.4147.105 Safari/537.36'}

urllib.request.install_opener(opener)

response=urllib.request.urlopen(url)

html= response.read().decode('utf-8')print(html)

總結

以上是生活随笔為你收集整理的python小甲鱼爬虫妹子_【Python学习日记】B站小甲鱼：爬虫的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python 百度百科爬虫_pytho
下一篇： python导包路径问题_python的