當前位置：首頁 > 编程语言 > python >内容正文

python

python爬虫urllib文档_11.【文本】Urllib(下) - 零基础学习Python爬虫系列

發布時間：2025/3/21 python 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 python爬虫urllib文档_11.【文本】Urllib(下) - 零基础学习Python爬虫系列小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

本文是視頻av20148524的相關代碼文檔

# urllib(下)

# post

# post 和 get 傳遞參數同時存在的一個url

url = "http://bbs.mumayi.com/member.php?mod=logging&action=login&loginsubmit=yes&infloat=yes&lssubmit=yes&inajax=1"

def getHeaders(temp_header="LwAk_3bcd_lastact=1519728938%09member.php%09logging;"):

headers = {

'Accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",

'Accept-Language': "zh-CN,zh;q=0.9,en;q=0.8",

'Cache-Control': "no-cache",

'Connection': "keep-alive",

'Cookie': "UM_distinctid=161d6e3534b2f5-01a7656f105614-32677b04-1aeaa0-161d6e3534cdc4; CNZZDATA30029311=cnzz_eid%3D572485951-1519727285-null%26ntime%3D1519727285; Hm_lvt_6d98eb77bfb4eda47bbaf129bdef0361=1519728678; LwAk_3bcd_pc_size_c=0; LwAk_3bcd_saltkey=ka871zV4; LwAk_3bcd_lastvisit=1519725234; LwAk_3bcd_noticeTitle=1; LwAk_3bcd_sendmail=1; Hm_lpvt_6d98eb77bfb4eda47bbaf129bdef0361=1519728837; " + temp_header,

'Host': "bbs.mumayi.com",

'Origin': "http://bbs.mumayi.com",

'Pragma': "no-cache",

'Referer': "http://bbs.mumayi.com/",

'Upgrade-Insecure-Requests': "1",

'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36",

}

return headers

# 我們post的數據，實際上也是要用dict形式寫在這里

post_data = {

"username": "service@52exe.cn",

"password": "123456a",

"quickforward": "yes",

"handlekey": "ls",

}

import urllib.request

import urllib.parse

# encode -》把我們能看懂的東西變成看不懂的東西(編碼)

# decode -》把我們看不懂的東西變成能看懂的東西(解碼)

# 相對于get，get他的參數，是放在url，但是post，我們需要單獨傳遞數據，對這個數據進行編碼。

encode_data = urllib.parse.urlencode(post_data).encode("utf-8")

request_attr = urllib.request.Request(url=url, data=encode_data, headers=getHeaders())

response_attr = urllib.request.urlopen(request_attr)

print("*"*30)

import re

temp_header = ";".join(re.findall("Set-Cookie:(.*?);",str(response_attr.headers)))

print("*"*30)

# 以下獲得到的內容就是我們沒有登陸的內容了

set_url = "http://bbs.mumayi.com/home.php?mod=spacecp"

request_attr = urllib.request.Request(url=set_url, headers=getHeaders(temp_header))

response_attr = urllib.request.urlopen(request_attr)

print(response_attr.read().decode("gbk"))

# post請求一般情況下會應用在登陸

# 我們如果登陸成功了，我們訪問該網站的其他頁面，還不是登陸狀態，所以，我們要處理好cookie，才可以確保我們使用登陸狀態進行數據的訪問。

總結

以上是生活随笔為你收集整理的python爬虫urllib文档_11.【文本】Urllib(下) - 零基础学习Python爬虫系列的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python windows控制台,如何
下一篇： python自动测试u_自动化测试——S