Python爬虫_selenium
生活随笔
收集整理的這篇文章主要介紹了
Python爬虫_selenium
小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
環(huán)境安裝
-
-
下載瀏覽器驅(qū)動(dòng)程序:
-
http://chromedriver.storage.googleapis.com/index.html
-
-
查看驅(qū)動(dòng)和瀏覽器版本的映射關(guān)系:
-
http://blog.csdn.net/huilan_same/article/details/51896672
-
應(yīng)用
from selenium import webdriver from time import sleep #實(shí)例化瀏覽器插件 bro = webdriver.Chrome(executable_path='./chromedriver.exe') bro.get('https://www.baidu.com') sleep(2) #標(biāo)簽定位 tag_input = bro.find_element_by_id('kw') tag_input.send_keys('人民幣') sleep(2)btn = bro.find_element_by_id('su') btn.click() sleep(2) #關(guān)閉瀏覽器 bro.quit()雪球網(wǎng)應(yīng)用
from selenium import webdriver from time import sleep bro = webdriver.Chrome(executable_path='./chromedriver.exe')bro.get('https://xueqiu.com/') sleep(5)#執(zhí)行js實(shí)現(xiàn)滾輪向下滑動(dòng) js = 'window.scrollTo(0,document.body.scrollHeight)' bro.execute_script(js) sleep(2) bro.execute_script(js) sleep(2) bro.execute_script(js) sleep(2) bro.execute_script(js) sleep(2) #定位到加載更多按鈕 a_tag = bro.find_element_by_xpath('//*[@id="app"]/div[3]/div/div[1]/div[2]/div[2]/a') a_tag.click() sleep(5) #獲取當(dāng)前瀏覽器頁(yè)面數(shù)據(jù)(動(dòng)態(tài)) print(bro.page_source)bro.quit()PhantomJs是一款無(wú)可視化界面的瀏覽器(免安裝) 已停止更新? 不建議使用?
from selenium import webdriver from time import sleep bro = webdriver.PhantomJS(executable_path=r'\phantomjs-2.1.1-windows\bin\phantomjs.exe')bro.get('https://xueqiu.com/') sleep(2)#截屏 bro.save_screenshot('./1.png') #執(zhí)行js實(shí)現(xiàn)滾輪向下滑動(dòng) js = 'window.scrollTo(0,document.body.scrollHeight)' bro.execute_script(js) sleep(2) bro.execute_script(js) sleep(2) bro.execute_script(js) sleep(2) bro.execute_script(js) sleep(2) bro.save_screenshot('./2.png') # a_tag = bro.find_element_by_xpath('//*[@id="app"]/div[3]/div/div[1]/div[2]/div[2]/a') # bro.save_screenshot('./2.png') # a_tag.click() sleep(2) #獲取當(dāng)前瀏覽器頁(yè)面數(shù)據(jù)(動(dòng)態(tài)) print(bro.page_source)bro.quit()
谷歌無(wú)頭瀏覽器
from selenium import webdriver from time import sleep from selenium.webdriver.chrome.options import Options # 創(chuàng)建一個(gè)參數(shù)對(duì)象,用來(lái)控制chrome以無(wú)界面模式打開(kāi) chrome_options = Options() chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu')bro = webdriver.Chrome(executable_path='./chromedriver.exe',options=chrome_options) bro.get('https://www.baidu.com') sleep(2) bro.save_screenshot('1.png') #標(biāo)簽定位 tag_input = bro.find_element_by_id('kw') tag_input.send_keys('人民幣') sleep(2)btn = bro.find_element_by_id('su') btn.click() sleep(2)print(bro.page_source) bro.quit()動(dòng)作鏈
from selenium import webdriver from time import sleep from selenium.webdriver import ActionChains bro = webdriver.Chrome(executable_path='./chromedriver.exe') url = 'https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable' bro.get(url=url) #如果定位的標(biāo)簽存在于iframe標(biāo)簽之中,則必須經(jīng)過(guò)switch_to操作在進(jìn)行標(biāo)簽定位 bro.switch_to.frame('iframeResult') source_tag = bro.find_element_by_id('draggable') #創(chuàng)建一個(gè)動(dòng)作連的對(duì)象 action = ActionChains(bro) action.click_and_hold(source_tag)for i in range(4):#perform表示開(kāi)始執(zhí)行動(dòng)作鏈action.move_by_offset(20,0).perform()sleep(1) bro.quit()?selenium規(guī)避被檢測(cè)識(shí)別
現(xiàn)在不少大網(wǎng)站有對(duì)selenium采取監(jiān)測(cè)機(jī)制。比如正常情況下我們用瀏覽器訪問(wèn)淘寶等網(wǎng)站的 window.navigator.webdriver的值為undefined。而使用selenium訪問(wèn)則該值為true。
只需要設(shè)置Chromedriver的啟動(dòng)參數(shù)即可解決問(wèn)題。在啟動(dòng)Chromedriver之前,為Chrome開(kāi)啟實(shí)驗(yàn)性功能參數(shù) excludeSwitches,它的值為['enable-automation']
from selenium.webdriver import Chrome from selenium.webdriver import ChromeOptionsoption = ChromeOptions() option.add_experimental_option('excludeSwitches',['enable-automation']) driver=Chrome(options=option)?
轉(zhuǎn)載于:https://www.cnblogs.com/z1115230598/p/10987165.html
總結(jié)
以上是生活随笔為你收集整理的Python爬虫_selenium的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: SpringBoot RabbitMQ
- 下一篇: websocket python爬虫_p