日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

爬虫篇——代理IP爬取备用及存储

發布時間:2025/3/21 编程问答 18 豆豆
生活随笔 收集整理的這篇文章主要介紹了 爬虫篇——代理IP爬取备用及存储 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

爬蟲篇——代理IP爬取備用及存儲

  • 代碼

代碼

本文通過抓取免費的高匿IP代理,將其寫入列表并保存為json格式文件,且將代碼進行了封裝,方便以后抓取數據時動態的更新handle的IP地址,從一方面避免抓取數據時反爬的干擾。

# *************************** 免費高匿代理IP爬取 **************************** import urllib.request import requests from bs4 import BeautifulSoup import json class ProxySpider(object):def __init__(self):self.url = "https://www.xicidaili.com/nn/"self.headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"}self.ip_list = []self.ip_valid = []# 1、發送網絡請求 獲取數據def data_request(self):return requests.get(self.url,headers = self.headers).content.decode("utf-8")# 2、解析數據def data_parse(self,data):parse_data = BeautifulSoup(data,"lxml") # print(parse_data)all_proxy = parse_data.select('tr')del all_proxy[0]for proxy in all_proxy:ip = proxy.select("td")[1].get_text()port = proxy.select("td")[2].get_text()ip = ip + ":"+portself.ip_list.append(ip)# 3、檢查代理ip地址的可用性def ip_validation(self):for ip in self.ip_list:try:free_proxy = {}free_proxy["http"] = ipurl = "https://www.baidu.com/"requests.get(url,headers = self.headers,proxies = free_proxy)self.ip_valid.append(ip)except urllib.request.HTTPError as error:print(error.code)# 4、數據存儲def data_save(self):with open("free_proxy.json","w",encoding = "utf-8")as fp:json.dump(self.ip_valid,fp)# 4、統籌運行def run(self):# 1、請求數據data = self.data_request() # print(data)# 2、數據解析self.data_parse(data) # print(len(self.ip_list))# 3、檢查ip地址可用性self.ip_validation() # print(len(self.ip_valid))# 4、數據存儲self.data_save() if __name__ == "__main__":ProxySpider().run()

代碼運行結果:

["122.51.49.88:8888", "118.181.226.166:44640", "122.4.40.194:27430", "115.49.74.102:8118", "101.200.81.61:80", "49.76.237.243:8123", "124.156.98.172:80", "117.88.176.221:3000", "122.51.183.224:808", "119.254.94.93:46323", "59.44.78.30:42335", "27.208.231.100:8060", "113.77.101.202:8118", "124.239.216.14:8060", "101.132.123.99:8080", "60.31.213.115:808", "115.219.168.69:8118", "117.94.213.119:8118", "58.254.220.116:52470", "112.14.47.6:52024", "117.186.49.50:55443", "60.2.44.182:30963", "61.54.225.130:8060", "117.88.176.162:3000", "117.88.177.143:3000", "117.88.176.194:3000", "60.216.101.46:59351", "139.196.193.85:8080", "27.188.65.244:8060", "101.132.190.101:80", "60.190.250.120:8080", "115.46.116.170:8123", "120.198.76.45:41443", "218.59.193.14:47138", "121.237.149.63:3000", "121.237.148.31:3000", "117.88.177.197:3000", "117.88.176.55:3000", "119.180.173.81:8060", "222.95.144.202:3000", "117.88.176.170:3000", "121.237.148.241:3000", "183.195.106.118:8118", "114.104.134.142:8888", "223.68.190.130:8181", "121.237.149.218:3000", "110.189.152.86:52277", "27.184.157.205:8118", "112.194.112.175:8118", "202.107.233.123:8090", "119.84.112.137:80", "211.159.219.225:8118", "115.29.108.117:8118", "183.250.255.86:63000", "117.62.172.230:8118", "111.222.141.127:8118", "218.76.253.201:61408", "218.203.132.117:808", "221.193.94.18:8118", "121.237.149.206:3000", "220.173.143.242:808", "1.197.203.247:9999", "171.35.172.5:9999", "118.114.96.78:8118", "117.87.72.226:8118", "117.88.5.40:3000", "125.123.19.197:8118", "61.150.96.27:46111", "182.32.234.18:9999", "171.35.167.220:9999", "171.35.167.224:9999", "123.168.136.2:9999", "113.194.49.94:9999", "222.85.28.130:40505", "123.206.54.52:8118", "27.184.141.239:8118", "124.93.201.59:59618", "117.114.149.66:53281", "121.237.149.107:3000", "180.117.98.96:8118", "123.132.232.254:37638", "139.224.233.103:8118", "221.218.102.146:33323", "118.24.155.27:8118", "113.12.202.50:40498", "222.190.125.3:8118", "175.148.69.90:1133", "218.75.69.50:39590", "118.78.196.186:8118", "222.95.144.59:3000", "121.237.149.136:3000", "117.88.5.250:3000", "171.35.168.177:9999", "121.237.148.179:3000", "223.241.118.200:8010", "58.215.219.2:8000", "180.117.234.56:8118", "117.88.176.93:3000", "123.171.5.132:8118", "119.129.203.140:8118"]

by CyrusMay 2020 04 24

青春是手牽手坐上了
永不回頭的火車
總有一天我們都老了
不會遺憾就OK了
——————五月天——————

總結

以上是生活随笔為你收集整理的爬虫篇——代理IP爬取备用及存储的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。