Python实现批量网站URL存活检测
需求功能及其說明
1、客戶端
2、服務端
測試環境:
win7
python 3.3.2
chardet 2.3.0
?
腳本作用:
檢測系統中訪問異常(請求返回code值非200)的鏈接
開發環境
運行環境
業務邏輯流程圖
程序項目結構圖
演示效果圖(實際運行)
腳本涉及到的一些技巧
遇到的一些問題和解決方案
總結
用Python實現批量測試一組url的可用性(可以包括HTTP狀態、響應時間等)并統計出現不可用情況的次數和頻率等。
? ? ? 類似的,這樣的腳本可以判斷某個服務的可用性,以及在眾多的服務提供者中選擇最優的。
需求以及腳本實現的功能如下:
默認情況下,執行腳本會檢測一組url的可用性。
如果可用,返回從腳本所在的機器到HTTP服務器所消耗的時間和內容等信息。
如果url不可用,則記錄并提示用戶,并顯示不可用發生的時間。
默認情況下,允許最大的錯誤次數是200,數目可以自定義,如果達到允許的最大錯誤次數,則在輸出信息的最后,根據每一個url做出錯誤統計。
如果用戶手動停止腳本,則需要在輸出信息的最后,根據每一個url做出錯誤統計。
腳本中涉及的一些技巧:
使用gevent并發處理多個HTTP請求,多個請求之間無須等待響應(gevent還有很多使用技巧,可再自行學習);
使用signal模塊捕獲信號,如果捕獲到則處理并退出,避免主進程接收到KeyboardInterrupt直接退出但無法處理的問題;
注意留意腳本中關于統計次數方面的小技巧;
腳本運行效果圖(如果圖片看不清楚,請選擇“在新標簽頁中打開圖片”)如下:
腳本可以參見Github,https://github.com/DingGuodong/LinuxBashShellScriptForOps/tree/master/projects/checkServicesAvailability/HttpService
腳本如下:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | #!/usr/bin/python #?encoding:?utf-8 #?-*-?coding:?utf8?-*- """ Created?by?PyCharm. File:???????????????LinuxBashShellScriptForOps:testNoHttpResponseException,testHttpHostAvailability.py User:???????????????Guodong Create?Date:????????2016/10/26 Create?Time:????????12:09 ? Function: ????test?Http?Host?Availability ? Some?helpful?message: ????For?CentOS:?yum?-y?install?python-devel?python-pip;?pip?install?gevent ????For?Ubuntu:?apt-get?-y?install?python-dev?python-pip;?pip?install?gevent ????For?Windows:?pip?install?gevent ?""" import?signal import?time import?sys #??execute?some?operations?concurrently?using?python from?gevent?import?monkey ? monkey.patch_all() import?gevent import?urllib2 ? hosts?=?['https://webpush.wx2.qq.com/cgi-bin/mmwebwx-bin/synccheck', ?????????'https://webpush.wx.qq.com/cgi-bin/mmwebwx-bin/synccheck',?] ? errorStopCounts?=?200 ? quit_flag?=?False statistics?=?dict() ? ? def?changeQuit_flag(signum,?frame): ????del?signum,?frame ????global?quit_flag ????quit_flag?=?True ????print?"Canceled?task?on?their?own?by?the?user." ? ? def?testNoHttpResponseException(url): ????tryFlag?=?True ????global?quit_flag ????errorCounts?=?0 ????tryCounts?=?0 ????global?statistics ????globalStartTime?=?time.time() ????while?tryFlag: ????????if?not?quit_flag: ????????????tryCounts?+=?1 ????????????print('GET:?%s'?%?url) ????????????try: ????????????????startTime?=?time.time() ????????????????resp?=?urllib2.urlopen(url)??#?using?module?'request'?will?be?better,?request?will?return?header?info.. ????????????????endTime?=?time.time() ????????????????data?=?resp.read() ????????????????responseTime?=?endTime?-?startTime ????????????????print?'%d?bytes?received?from?%s.?response?time?is:?%s'?%?(len(data),?url,?responseTime) ????????????????print?"data?received?from?%s?at?%d?try?is:?%s"?%?(url,?tryCounts,?data) ????????????????gevent.sleep(2) ????????????except?urllib2.HTTPError?as?e: ????????????????errorCounts?+=?1 ????????????????statistics[url]?=?errorCounts ????????????????currentTime?=?time.strftime('%Y-%m-%d?%H:%M:%S',?time.localtime()) ????????????????print?"HTTPError?occurred,?%s,?and?this?is?%d?times(total)?occurs?on?%s?at?%s."?%?( ????????????????????e,?statistics[url],?url,?currentTime) ? ????????????????if?errorCounts?>=?errorStopCounts: ????????????????????globalEndTime?=?time.time() ????????????????????tryFlag?=?False ????????else: ????????????globalEndTime?=?time.time() ????????????break ? ????for?url?in?statistics: ????????print?"Total?error?counts?is?%d?on?%s"?%?(statistics[url],?url) ????????hosts.remove(url) ????for?url?in?hosts: ????????print?"Total?error?counts?is?0?on?%s"?%?url ????globalUsedTime?=?globalEndTime?-?globalStartTime ????print?"Total?time?use?is?%s"?%?globalUsedTime ????sys.exit(0) ? ? try: ????#?Even?if?the?user?cancelled?the?task, ????#?it?also?can?statistics?the?number?of?errors?and?the?consumption?of?time?for?each?host. ????signal.signal(signal.SIGINT,?changeQuit_flag) ? ????gevent.joinall([gevent.spawn(testNoHttpResponseException,?host)?for?host?in?hosts]) except?KeyboardInterrupt: ????#?Note:?this?line?can?NOT?be?reached,?because?signal?has?been?captured! ????print?"Canceled?task?on?their?own?by?the?user." ????sys.exit(0) |
tag:python計算HTTp可用性,python 統計次數,python gevent
--end--
做滲透測試的時候/大量推廣客戶,有個比較大的項目,里面有幾百個網站,這樣你必須首先確定哪些網站是正常,哪些網站是不正常的。所以自己就編了一個小腳本,為以后方便使用。
具體實現的代碼如下:
| 12345678910111213141516171819202122232425262728 | #!/usr/bin/python# -*- coding: UTF-8 -*-'''@Author:w2n1ck@博客:http://byd.dropsec.xyz/'''import requestsimport sysf = open('url.txt', 'r')url = f.readlines()length = len(url)url_result_success=[]url_result_failed=[]for i in range(0,length): try: response = requests.get(url[i].strip(), verify=False, allow_redirects=True, timeout=5) if response.status_code != 200: raise requests.RequestException(u"Status code error: {}".format(response.status_code)) except requests.RequestException as e: url_result_failed.append(url[i]) continue url_result_success.append(url[i])f.close()result_len = len(url_result_success)for i in range(0,result_len): print '網址%s' % url_result_success[i].strip()+'打開成功' |
測試結果如下:
遇到的問題:
剛開始測試的時候,遇到只要是不能錯誤,或者不存在的,直接報錯停止程序。后來發現是因為response.status_code != 200這里取狀態碼的時候錯誤。
因為有的網站不能打開的話,不會返回狀態碼。所以程序就不知道!==200怎么處理了。
解決方法:
使用try except else捕捉異常
具體代碼為:
| 1234567 | try: response = requests.get(url[i].strip(), verify=False, allow_redirects=True, timeout=5) if response.status_code != 200: raise requests.RequestException(u"Status code error: {}".format(response.status_code)) except requests.RequestException as e: url_result_failed.append(url[i]) continue |
Python檢測URL狀態
Python檢測URL狀態,并追加保存200的URL:
1.Requests
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | #! /usr/bin/env python #coding=utf-8 import?sys import?requests def?getHttpStatusCode(url): ????try: ????????request?=?requests.get(url) ????????httpStatusCode?=?request.status_code ????????return?httpStatusCode ????except?requests.exceptions.HTTPError as e: ????????return?e ? if?__name__?==?"__main__": ????with?open('1.txt',?'r') as f: ????????for?line?in?f: ????????????try: ????????????????status?=?getHttpStatusCode(line.strip('\n'))#換行符 ????????????????if?status?==?200: ????????????????????with?open('200.txt','a') as f: ????????????????????????f.write(line?+?'\n') ????????????????????????print?line ????????????????else: ????????????????????print?'no 200 code' ????????????except?Exception as e: ????????????????print?e |
1 #! /usr/bin/env python 2 # -*--coding:utf-8*- 3 4 import requests 5 6 def request_status(line): 7 conn = requests.get(line) 8 if conn.status_code == 200: 9 with open('url_200.txt', 'a') as f: 10 f.write(line + '\n') 11 return line13 else: 14 return None 15 16 17 if __name__ == '__main__': 18 with open('/1.txt', 'rb') as f: 19 for line in f: 20 try: 21 purge_url = request_status(line.strip('\n')) 22 except Exception as e: 23 pass
2.Urllib
#! /usr/bin/env python #coding:utf-8 import os,urllib,linecache import sys result = list()for x in linecache.updatecache(r'1.txt'):try:a = urllib.urlopen(x.replace('/n','')).getcode()#print x,aexcept Exception,e:print eif a == 200:#result.append(x) #保存#result.sort() #排序結果#open('2.txt', 'w').write('%s' % '\n'.join(result)) #保存入結果文件with open ('200urllib.txt','a') as f: ## r只讀,w可寫,a追加f.write(x + '\n')else:print 'error'總結
以上是生活随笔為你收集整理的Python实现批量网站URL存活检测的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: H5动效的常见制作手法
- 下一篇: 使用python将视频中的音频分离出来