python3实现抓取网页资源的 N 种方法(内附200GPython学习资料)
生活随笔
收集整理的這篇文章主要介紹了
python3实现抓取网页资源的 N 种方法(内附200GPython学习资料)
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
這兩天學習了python3實現抓取網頁資源的方法,發現了很多種方法,所以,今天添加一點小筆記。
文章最后為各位小伙伴提供超級彩蛋!不要錯過了! 1、最簡單
import urllib.request response = urllib.request.urlopen('http://python.org/') html = response.read() 復制代碼2、使用 Request
import urllib.requestreq = urllib.request.Request('http://python.org/') response = urllib.request.urlopen(req) the_page = response.read() 復制代碼3、發送數據
#! /usr/bin/env python3import urllib.parse import urllib.requesturl = 'http://localhost/login.php' user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' values = {'act' : 'login','login[email]' : 'yzhang@i9i8.com','login[password]' : '123456'}data = urllib.parse.urlencode(values) req = urllib.request.Request(url, data) req.add_header('Referer', 'http://www.python.org/') response = urllib.request.urlopen(req) the_page = response.read()print(the_page.decode("utf8")) 復制代碼4、發送數據和header
#! /usr/bin/env python3import urllib.parse import urllib.requesturl = 'http://localhost/login.php' user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' values = {'act' : 'login','login[email]' : 'yzhang@i9i8.com','login[password]' : '123456'} headers = { 'User-Agent' : user_agent }data = urllib.parse.urlencode(values) req = urllib.request.Request(url, data, headers) response = urllib.request.urlopen(req) the_page = response.read()print(the_page.decode("utf8")) 復制代碼5、http 錯誤
#! /usr/bin/env python3import urllib.requestreq = urllib.request.Request('http://www.python.org/fish.html') try:urllib.request.urlopen(req) except urllib.error.HTTPError as e:print(e.code)print(e.read().decode("utf8")) 復制代碼6、異常處理1
#! /usr/bin/env python3from urllib.request import Request, urlopen from urllib.error import URLError, HTTPError req = Request("http://twitter.com/") try:response = urlopen(req) except HTTPError as e:print('The server couldn\'t fulfill the request.')print('Error code: ', e.code) except URLError as e:print('We failed to reach a server.')print('Reason: ', e.reason) else:print("good!")print(response.read().decode("utf8")) 復制代碼7、異常處理2
#! /usr/bin/env python3from urllib.request import Request, urlopen from urllib.error import URLError req = Request("http://twitter.com/") try:response = urlopen(req) except URLError as e:if hasattr(e, 'reason'):print('We failed to reach a server.')print('Reason: ', e.reason)elif hasattr(e, 'code'):print('The server couldn\'t fulfill the request.')print('Error code: ', e.code) else:print("good!")print(response.read().decode("utf8")) 復制代碼8、HTTP 認證
#! /usr/bin/env python3import urllib.request# create a password manager password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()# Add the username and password. # If we knew the realm, we could use it instead of None. top_level_url = "https://cms.tetx.com/" password_mgr.add_password(None, top_level_url, 'yzhang', 'cccddd')handler = urllib.request.HTTPBasicAuthHandler(password_mgr)# create "opener" (OpenerDirector instance) opener = urllib.request.build_opener(handler)# use the opener to fetch a URL a_url = "https://cms.tetx.com/" x = opener.open(a_url) print(x.read())# Install the opener. # Now all calls to urllib.request.urlopen use our opener. urllib.request.install_opener(opener)a = urllib.request.urlopen(a_url).read().decode('utf8') print(a) 復制代碼9、使用代理
#! /usr/bin/env python3import urllib.requestproxy_support = urllib.request.ProxyHandler({'sock5': 'localhost:1080'}) opener = urllib.request.build_opener(proxy_support) urllib.request.install_opener(opener)a = urllib.request.urlopen("http://g.cn").read().decode("utf8") print(a) 復制代碼10、超時
#! /usr/bin/env python3import socket import urllib.request# timeout in seconds timeout = 2 socket.setdefaulttimeout(timeout)# this call to urllib.request.urlopen now uses the default timeout # we have set in the socket module req = urllib.request.Request('http://twitter.com/') a = urllib.request.urlopen(req).read() print(a) 復制代碼超多Python免費資料領取!看下面!需要的小伙伴加美女姐姐的微信:kele22558!
轉載于:https://juejin.im/post/5b7686415188253345137109
總結
以上是生活随笔為你收集整理的python3实现抓取网页资源的 N 种方法(内附200GPython学习资料)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 盒马鲜生颠覆传统生鲜市场的胜算几何?
- 下一篇: websocket python爬虫_p