日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

爬虫 防盗链

發(fā)布時(shí)間:2024/4/15 编程问答 31 豆豆
生活随笔 收集整理的這篇文章主要介紹了 爬虫 防盗链 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

聲明:以某某圖 為例,代碼僅供學(xué)習(xí)參考!

1、利用fiddler,訪問某某圖首頁(yè)進(jìn)行header獲取 (獲取結(jié)果如下)

headers = {"Accept":"image/webp,image/apng,image/*,*/*;q=0.8",# "Accept-Encoding":"gzip, deflate", 本地查看時(shí),會(huì)導(dǎo)致亂碼"Accept-Language":"zh-CN,zh;q=0.8","User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36","Connection":"keep-alive","Referer":"http://www.mzitu.com"}

2、拼接headers備用

headall = [] for key, value in headers.items():item = (key, value)headall.append(item)

3、獲取html內(nèi)容

def openhtml():cjar = http.cookiejar.CookieJar()#127.0.0.1:8888 為fiddler 的代理地址 方便查看信息 找錯(cuò)proxy = urllib2.ProxyHandler({'http': '127.0.0.1:8888'})opener = urllib2.build_opener(proxy, urllib2.HTTPHandler, urllib2.HTTPCookieProcessor(cjar))opener.addheaders = headallurllib2.install_opener(opener)data = urllib2.urlopen(url).read()return data

4、利用正則表達(dá)式獲取所有圖片鏈接并保存到本地

def download(data):#正則匹配urlreg = "data-original='.*?\.jpg"imgre = re.compile(reg)imglist = re.findall(imgre, data)x = 0for image_url in imglist:image_url = image_url.replace("data-original='", "")print image_urlopener = urllib2.build_opener()#反 防盜鏈 精髓在此opener.addheaders = headalldata = opener.open(image_url).read()with open("C:\Users\zzz\Desktop\images\\" + str(x) + ".jpg", "wb") as code:code.write(data)x += 1

5、完整代碼

#coding=utf8 import urllib2 import http.cookiejar import reurl = "http://www.mzitu.com/xinggan" headers = {"Accept":"image/webp,image/apng,image/*,*/*;q=0.8",# "Accept-Encoding":"gzip, deflate","Accept-Language":"zh-CN,zh;q=0.8","User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36","Connection":"keep-alive","Referer":"http://www.mzitu.com"}headall = [] for key, value in headers.items():item = (key, value)headall.append(item)#獲取html def openhtml():cjar = http.cookiejar.CookieJar()#127.0.0.1:8888 為fiddler 的代理地址 方便查看信息 找錯(cuò)proxy = urllib2.ProxyHandler({'http': '127.0.0.1:8888'})opener = urllib2.build_opener(proxy, urllib2.HTTPHandler, urllib2.HTTPCookieProcessor(cjar))opener.addheaders = headallurllib2.install_opener(opener)data = urllib2.urlopen(url).read()return data#下載 def download(data):#正則匹配urlreg = "data-original='.*?\.jpg"imgre = re.compile(reg)imglist = re.findall(imgre, data)x = 0for image_url in imglist:image_url = image_url.replace("data-original='", "")print image_urlopener = urllib2.build_opener()#反 防盜鏈 精髓在此opener.addheaders = headalldata = opener.open(image_url).read()with open("C:\Users\zzz\Desktop\images\\" + str(x) + ".jpg", "wb") as code:code.write(data)x += 1if __name__ == '__main__':data = openhtml()download(data)

?

轉(zhuǎn)載于:https://www.cnblogs.com/z-z-z/p/7755763.html

總結(jié)

以上是生活随笔為你收集整理的爬虫 防盗链的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。