日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

python通过url下载文件不可读_python-selenium实现的简易下载器,并常见错误解决

發布時間:2024/9/19 python 33 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python通过url下载文件不可读_python-selenium实现的简易下载器,并常见错误解决 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

簡易下載器的實現

支持代理、失敗重試、確保包含指定ID元素(可根據需求自定義修改)

# coding: utf-8

from Utils import logging

from bs4 import BeautifulSoup as bs

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.common.proxy import ProxyType

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.support.ui import WebDriverWait

class HtmlDownloader:

def __init__(self):

self.driver = webdriver.PhantomJS()

def setProxy(self, proxyStr):

# 利用DesiredCapabilities(代理設置)參數值,重新打開一個sessionId

proxy=webdriver.Proxy()

proxy.proxy_type=ProxyType.MANUAL

proxy.http_proxy=proxyStr

# 將代理設置添加到webdriver.DesiredCapabilities.PHANTOMJS中

proxy.add_to_capabilities(webdriver.DesiredCapabilities.PHANTOMJS)

self.driver.start_session(webdriver.DesiredCapabilities.PHANTOMJS)

def rmProxy(self):

# 還原為系統代理

proxy=webdriver.Proxy()

proxy.proxy_type=ProxyType.DIRECT

proxy.add_to_capabilities(webdriver.DesiredCapabilities.PHANTOMJS)

browser.start_session(webdriver.DesiredCapabilities.PHANTOMJS)

def download(self, returnType, url, ensureId, proxyStr = None):

if proxyStr:

self.setProxy(proxyStr)

else:

self.rmProxy()

self.driver.get(url)

# special for xxx.com

# your code here

# ensure for some element

try:

WebDriverWait(self.driver, 30).until(EC.presence_of_element_located((By.ID, ensureId)))

if returnType == "html":

downloadResult = self.driver.page_source

elif returnType == "bs":

downloadResult = bs(self.driver.page_source, 'lxml')

logging("i", "download %s bytes" % len(self.driver.page_source))

return downloadResult

except Exception,e:

logging("e", str(e))

finally:

self.driver.close()

def safeDownload(self, returnType, url, ensureId, proxyStr = None):

downloadResult = None

failTimes = 0

while not downloadResult:

downloadResult = self.download(returnType, url, ensureId, proxyStr)

if not downloadResult:

failTimes += 1

if failTimes == 5:

logging("w", "failed %s times, will abort" % failTimes)

break

logging("w", "failed %s times, will retry" % failTimes)

return downloadResult

元素不可見導致不能操作的錯誤

# ElementNotVisibleException: Message: {"errorMessage":"Element is not currently visible and may not be manipulated"

# Screenshot: available via screen

首先嘗試設定窗口大小

self.driver.set_window_size(1024, 768)

不行的話再嘗試滾動頁面,如滾動到底部:

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

總結

以上是生活随笔為你收集整理的python通过url下载文件不可读_python-selenium实现的简易下载器,并常见错误解决的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。