日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Jsoup爬虫以及防反爬

發布時間:2023/12/10 编程问答 29 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Jsoup爬虫以及防反爬 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1 java也可以爬取第三方網站的數據;

注: 1 ip限制【防爬】

? ? ? ? ?2 header參數referer

? ? ? ? ?3 偽裝hearder ua

就源引 一個第三方代理網站試試

{Random r = new Random();String[] ua = {"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.87 Safari/537.36 OPR/37.0.2178.32","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.57.2 (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10586","Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko","Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)","Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)","Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0)","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 BIDUBrowser/8.3 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36 Core/1.47.277.400 QQBrowser/9.4.7658.400","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 UBrowser/5.6.12150.8 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36 SE 2.X MetaSr 1.0","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36 TheWorld 7","Mozilla/5.0 (Windows NT 6.1; W…) Gecko/20100101 Firefox/60.0"};int i = r.nextInt(14);logger.info("檢測中------ {}:{}",ip,port );Map<String,String> map = new HashMap<String,String>();map.put("waybillNo","DD1838768852");try {total ++ ;long a = System.currentTimeMillis();//爬取的目標網站,url記得換下。。。!!! 代理ip網站Document doc = Jsoup.connect("http://xxxx.com/dayProxy/ip/314639.html").timeout(5000)//.proxy(ip, port).data(map).ignoreContentType(true).userAgent(ua[i]).header("referer","http://xxxx.com/dayProxy.html")//這個來源記得換...post();System.out.println(ip+":"+port+"訪問時間:"+(System.currentTimeMillis() -a) + " 訪問結果: "+doc.text());suc ++ ;} catch (IOException e) {e.printStackTrace();fail ++ ;}finally {if (total == count ) {System.out.println("總次數:"+total);System.out.println("成功次數:"+suc);System.out.println("失敗次數:"+fail);}}}

這樣通過org.jsoup.nodes.Document解析返回的數據, 解析出ip 和端口,

然后 上面的同樣代碼只要

.proxy(ip, port)

放開這句 填入對應的ip port即可開啟代理訪問模式 ,

可以過濾90%的反防;

?

?

?

?

?

?

?

?

?

?

?

?

?

總結

以上是生活随笔為你收集整理的Jsoup爬虫以及防反爬的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。