Open-Falcon 监控系统监控 MySQL/Redis/MongoDB 状态监控
背景:
Open-Falcon 是小米運(yùn)維部開源的一款互聯(lián)網(wǎng)企業(yè)級監(jiān)控系統(tǒng)解決方案,具體的安裝和使用說明請見官網(wǎng):http://open-falcon.org/,是一款比較全的監(jiān)控。而且提供各種API,只需要把數(shù)據(jù)按照規(guī)定給出就能出圖,以及報(bào)警、集群支持等等。
監(jiān)控:
1) MySQL 收集信息腳本(mysql_monitor.py)
#!/bin/env python # -*- encoding: utf-8 -*-from __future__ import division import MySQLdb import datetime import time import os import sys import fileinput import requests import json import reclass MySQLMonitorInfo():def __init__(self,host,port,user,password):self.host = hostself.port = portself.user = userself.password = passworddef stat_info(self):try:m = MySQLdb.connect(host=self.host,user=self.user,passwd=self.password,port=self.port,charset='utf8')query = "SHOW GLOBAL STATUS"cursor = m.cursor()cursor.execute(query)Str_string = cursor.fetchall()Status_dict = {}for Str_key,Str_value in Str_string:Status_dict[Str_key] = Str_valuecursor.close()m.close()return Status_dictexcept Exception, e:print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")print eStatus_dict = {}return Status_dict def engine_info(self):try:m = MySQLdb.connect(host=self.host,user=self.user,passwd=self.password,port=self.port,charset='utf8')_engine_regex = re.compile(ur'(History list length) ([0-9]+\.?[0-9]*)\n')query = "SHOW ENGINE INNODB STATUS"cursor = m.cursor()cursor.execute(query)Str_string = cursor.fetchone()a,b,c = Str_stringcursor.close()m.close()return dict(_engine_regex.findall(c))except Exception, e:print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")print ereturn dict(History_list_length=0)if __name__ == '__main__':open_falcon_api = 'http://192.168.200.86:1988/v1/push'db_list= []for line in fileinput.input():db_list.append(line.strip())for db_info in db_list: # host,port,user,password,endpoint,metric = db_info.split(',')host,port,user,password,endpoint = db_info.split(',')timestamp = int(time.time())step = 60 # tags = "port=%s" %porttags = ""conn = MySQLMonitorInfo(host,int(port),user,password)stat_info = conn.stat_info()engine_info = conn.engine_info()mysql_stat_list = []monitor_keys = [('Com_select','COUNTER'),('Qcache_hits','COUNTER'),('Com_insert','COUNTER'),('Com_update','COUNTER'),('Com_delete','COUNTER'),('Com_replace','COUNTER'),('MySQL_QPS','COUNTER'),('MySQL_TPS','COUNTER'),('ReadWrite_ratio','GAUGE'),('Innodb_buffer_pool_read_requests','COUNTER'),('Innodb_buffer_pool_reads','COUNTER'),('Innodb_buffer_read_hit_ratio','GAUGE'),('Innodb_buffer_pool_pages_flushed','COUNTER'),('Innodb_buffer_pool_pages_free','GAUGE'),('Innodb_buffer_pool_pages_dirty','GAUGE'),('Innodb_buffer_pool_pages_data','GAUGE'),('Bytes_received','COUNTER'),('Bytes_sent','COUNTER'),('Innodb_rows_deleted','COUNTER'),('Innodb_rows_inserted','COUNTER'),('Innodb_rows_read','COUNTER'),('Innodb_rows_updated','COUNTER'),('Innodb_os_log_fsyncs','COUNTER'),('Innodb_os_log_written','COUNTER'),('Created_tmp_disk_tables','COUNTER'),('Created_tmp_tables','COUNTER'),('Connections','COUNTER'),('Innodb_log_waits','COUNTER'),('Slow_queries','COUNTER'),('Binlog_cache_disk_use','COUNTER')]for _key,falcon_type in monitor_keys:if _key == 'MySQL_QPS':_value = int(stat_info.get('Com_select',0)) + int(stat_info.get('Qcache_hits',0))elif _key == 'MySQL_TPS':_value = int(stat_info.get('Com_insert',0)) + int(stat_info.get('Com_update',0)) + int(stat_info.get('Com_delete',0)) + int(stat_info.get('Com_replace',0))elif _key == 'Innodb_buffer_read_hit_ratio':try:_value = round((int(stat_info.get('Innodb_buffer_pool_read_requests',0)) - int(stat_info.get('Innodb_buffer_pool_reads',0)))/int(stat_info.get('Innodb_buffer_pool_read_requests',0)) * 100,3)except ZeroDivisionError:_value = 0elif _key == 'ReadWrite_ratio':try:_value = round((int(stat_info.get('Com_select',0)) + int(stat_info.get('Qcache_hits',0)))/(int(stat_info.get('Com_insert',0)) + int(stat_info.get('Com_update',0)) + int(stat_info.get('Com_delete',0)) + int(stat_info.get('Com_replace',0))),2)except ZeroDivisionError:_value = 0 else:_value = int(stat_info.get(_key,0))falcon_format = {'Metric': '%s' % (_key),'Endpoint': endpoint,'Timestamp': timestamp,'Step': step,'Value': _value,'CounterType': falcon_type,'TAGS': tags}mysql_stat_list.append(falcon_format)#_key : History list lengthfor _key,_value in engine_info.items():_key = "Undo_Log_Length"falcon_format = {'Metric': '%s' % (_key),'Endpoint': endpoint,'Timestamp': timestamp,'Step': step,'Value': int(_value),'CounterType': "GAUGE",'TAGS': tags}mysql_stat_list.append(falcon_format)print json.dumps(mysql_stat_list,sort_keys=True,indent=4)requests.post(open_falcon_api, data=json.dumps(mysql_stat_list))指標(biāo)說明:收集指標(biāo)里的COUNTER表示每秒執(zhí)行次數(shù),GAUGE表示直接輸出值。
| 指標(biāo) | 類型 | 說明 |
| ?Undo_Log_Length | ?GAUGE | 未清除的Undo事務(wù)數(shù) |
| ?Com_select | ?COUNTER | ?select/秒=QPS |
| ?Com_insert | ?COUNTER | ?insert/秒 |
| ?Com_update | ?COUNTER | ?update/秒 |
| ?Com_delete | ?COUNTER | ?delete/秒 |
| ?Com_replace | ?COUNTER | ?replace/秒 |
| ?MySQL_QPS | ?COUNTER | ?QPS |
| ?MySQL_TPS | ?COUNTER | ?TPS? |
| ?ReadWrite_ratio | ?GAUGE | ?讀寫比例 |
| ?Innodb_buffer_pool_read_requests | ?COUNTER | ?innodb buffer pool 讀次數(shù)/秒 |
| ?Innodb_buffer_pool_reads | ?COUNTER | ?Disk 讀次數(shù)/秒 |
| ?Innodb_buffer_read_hit_ratio | ?GAUGE | ?innodb buffer pool 命中率 |
| ?Innodb_buffer_pool_pages_flushed | ?COUNTER | ?innodb buffer pool 刷寫到磁盤的頁數(shù)/秒 |
| ?Innodb_buffer_pool_pages_free | ?GAUGE | ?innodb buffer pool 空閑頁的數(shù)量 |
| ?Innodb_buffer_pool_pages_dirty | ?GAUGE | ?innodb buffer pool 臟頁的數(shù)量 |
| ?Innodb_buffer_pool_pages_data | ?GAUGE | ?innodb buffer pool 數(shù)據(jù)頁的數(shù)量 |
| ?Bytes_received | ?COUNTER | ?接收字節(jié)數(shù)/秒 |
| ?Bytes_sent | ?COUNTER | ?發(fā)送字節(jié)數(shù)/秒 |
| ?Innodb_rows_deleted | ?COUNTER | ?innodb表刪除的行數(shù)/秒 |
| ?Innodb_rows_inserted | ?COUNTER? | ?innodb表插入的行數(shù)/秒 |
| ?Innodb_rows_read | ?COUNTER? | ?innodb表讀取的行數(shù)/秒 |
| ?Innodb_rows_updated? | ?COUNTER? | ?innodb表更新的行數(shù)/秒 |
| ?Innodb_os_log_fsyncs | ?COUNTER? | ?Redo Log fsync次數(shù)/秒? |
| ?Innodb_os_log_written | ?COUNTER? | ?Redo Log 寫入的字節(jié)數(shù)/秒 |
| ?Created_tmp_disk_tables | ?COUNTER? | ?創(chuàng)建磁盤臨時(shí)表的數(shù)量/秒 |
| ?Created_tmp_tables | ?COUNTER? | ?創(chuàng)建內(nèi)存臨時(shí)表的數(shù)量/秒 |
| ?Connections | ?COUNTER? | ?連接數(shù)/秒 |
| ?Innodb_log_waits | ?COUNTER? | ?innodb log buffer不足等待的數(shù)量/秒 |
| ?Slow_queries | ?COUNTER? | ?慢查詢數(shù)/秒 |
| ?Binlog_cache_disk_use | ?COUNTER? | ?Binlog Cache不足的數(shù)量/秒 |
使用說明:讀取配置到都數(shù)據(jù)庫列表執(zhí)行,配置文件格式如下(mysqldb_list.txt):
?IP,Port,User,Password,endpoint
192.168.2.21,3306,root,123,mysql-21:3306 192.168.2.88,3306,root,123,mysql-88:3306最后執(zhí)行:
python mysql_monitor.py mysqldb_list.txt2) Redis 收集信息腳本(redis_monitor.py)
#!/bin/env python #-*- coding:utf-8 -*-import json import time import re import redis import requests import fileinput import datetimeclass RedisMonitorInfo():def __init__(self,host,port,password):self.host = hostself.port = portself.password = passworddef stat_info(self):try:r = redis.Redis(host=self.host, port=self.port, password=self.password)stat_info = r.info()return stat_infoexcept Exception, e:print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")print ereturn dict()def cmdstat_info(self):try:r = redis.Redis(host=self.host, port=self.port, password=self.password)cmdstat_info = r.info('Commandstats')return cmdstat_infoexcept Exception, e:print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")print ereturn dict()if __name__ == '__main__':open_falcon_api = 'http://192.168.200.86:1988/v1/push'db_list= []for line in fileinput.input():db_list.append(line.strip())for db_info in db_list: # host,port,password,endpoint,metric = db_info.split(',')host,port,password,endpoint = db_info.split(',')timestamp = int(time.time())step = 60falcon_type = 'COUNTER' # tags = "port=%s" %porttags = ""conn = RedisMonitorInfo(host,port,password)#查看各個(gè)命令每秒執(zhí)行次數(shù)redis_cmdstat_dict = {}redis_cmdstat_list = []cmdstat_info = conn.cmdstat_info()for cmdkey in cmdstat_info:redis_cmdstat_dict[cmdkey] = cmdstat_info[cmdkey]['calls']for _key,_value in redis_cmdstat_dict.items():falcon_format = {'Metric': '%s' % (_key),'Endpoint': endpoint,'Timestamp': timestamp,'Step': step,'Value': int(_value),'CounterType': falcon_type,'TAGS': tags}redis_cmdstat_list.append(falcon_format)#查看Redis各種狀態(tài),根據(jù)需要增刪監(jiān)控項(xiàng),str的值需要轉(zhuǎn)換成intredis_stat_list = []monitor_keys = [('connected_clients','GAUGE'),('blocked_clients','GAUGE'),('used_memory','GAUGE'),('used_memory_rss','GAUGE'),('mem_fragmentation_ratio','GAUGE'),('total_commands_processed','COUNTER'),('rejected_connections','COUNTER'),('expired_keys','COUNTER'),('evicted_keys','COUNTER'),('keyspace_hits','COUNTER'),('keyspace_misses','COUNTER'),('keyspace_hit_ratio','GAUGE'),('keys_num','GAUGE'),]stat_info = conn.stat_info() for _key,falcon_type in monitor_keys:#計(jì)算命中率if _key == 'keyspace_hit_ratio':try:_value = round(float(stat_info.get('keyspace_hits',0))/(int(stat_info.get('keyspace_hits',0)) + int(stat_info.get('keyspace_misses',0))),4)*100except ZeroDivisionError:_value = 0#碎片率是浮點(diǎn)數(shù)elif _key == 'mem_fragmentation_ratio':_value = float(stat_info.get(_key,0))#拿到key的數(shù)量elif _key == 'keys_num':_value = 0 for i in range(16):_key = 'db'+str(i)_num = stat_info.get(_key)if _num:_value += int(_num.get('keys'))_key = 'keys_num'#其他的都采集成counter,intelse:try:_value = int(stat_info[_key])except:continuefalcon_format = {'Metric': '%s' % (_key),'Endpoint': endpoint,'Timestamp': timestamp,'Step': step,'Value': _value,'CounterType': falcon_type,'TAGS': tags}redis_stat_list.append(falcon_format)load_data = redis_stat_list+redis_cmdstat_listprint json.dumps(load_data,sort_keys=True,indent=4)requests.post(open_falcon_api, data=json.dumps(load_data))指標(biāo)說明:收集指標(biāo)里的COUNTER表示每秒執(zhí)行次數(shù),GAUGE表示直接輸出值。
| 指標(biāo) | 類型 | 說明 |
| ?connected_clients | ?GAUGE | 連接的客戶端個(gè)數(shù) |
| ?blocked_clients | ?GAUGE | 被阻塞客戶端的數(shù)量 |
| ?used_memory | ?GAUGE | ?Redis分配的內(nèi)存的總量 |
| ?used_memory_rss | ?GAUGE | ?OS分配的內(nèi)存的總量 |
| ?mem_fragmentation_ratio | ?GAUGE | ?內(nèi)存碎片率,used_memory_rss/used_memory |
| ?total_commands_processed | ?COUNTER | ?每秒執(zhí)行的命令數(shù),比較準(zhǔn)確的QPS |
| ?rejected_connections | ?COUNTER | ?被拒絕的連接數(shù)/秒 |
| ?expired_keys | ?COUNTER | ?過期KEY的數(shù)量/秒? |
| ?evicted_keys | ?COUNTER | ?被驅(qū)逐KEY的數(shù)量/秒 |
| ?keyspace_hits | ?COUNTER | ?命中KEY的數(shù)量/秒 |
| ?keyspace_misses | ?COUNTER | ?未命中KEY的數(shù)量/秒 |
| ?keyspace_hit_ratio | ?GAUGE | ?KEY的命中率 |
| ?keys_num | ?GAUGE | ?KEY的數(shù)量 |
| ?cmd_* | ?COUNTER | ?各種名字都執(zhí)行次數(shù)/秒 |
使用說明:讀取配置到都數(shù)據(jù)庫列表執(zhí)行,配置文件格式如下(redisdb_list.txt):
?IP,Port,Password,endpoint
192.168.1.56,7021,zhoujy,redis-56:7021 192.168.1.55,7021,zhoujy,redis-55:7021最后執(zhí)行:
python redis_monitor.py redisdb_list.txt3) MongoDB 收集信息腳本(mongodb_monitor.py)
...后續(xù)添加
?
4)其他相關(guān)的監(jiān)控(需要裝上agent),比如下面的指標(biāo):
| load.1min | all(#3)>10 | Redis服務(wù)器過載,處理能力下降 |
| cpu.idle | all(#3)<10 | CPU idle過低,處理能力下降 |
| df.bytes.free.percent | all(#3)<20 | 磁盤可用空間百分比低于20%,影響從庫RDB和AOF持久化 |
| mem.memfree.percent | all(#3)<15 | 內(nèi)存剩余低于15%,Redis有OOM killer和使用swap的風(fēng)險(xiǎn) |
| mem.swapfree.percent | all(#3)<80 | 使用20% swap,Redis性能下降或OOM風(fēng)險(xiǎn) |
| net.if.out.bytes | all(#3)>94371840 | 網(wǎng)絡(luò)出口流量超90MB,影響Redis響應(yīng) |
| net.if.in.bytes | all(#3)>94371840 | 網(wǎng)絡(luò)入口流量超90MB,影響Redis響應(yīng) |
| disk.io.util | all(#3)>90 | 磁盤IO可能存負(fù)載,影響從庫持久化和阻塞寫 |
?
相關(guān)文檔:
https://github.com/iambocai/falcon-monit-scripts(redis monitor)
https://github.com/ZhuoRoger/redismon(redis monitor)
https://www.cnblogs.com/zhoujinyi/p/6645104.html
轉(zhuǎn)載于:https://www.cnblogs.com/jackyzm/p/9600496.html
總結(jié)
以上是生活随笔為你收集整理的Open-Falcon 监控系统监控 MySQL/Redis/MongoDB 状态监控的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 冲突处理
- 下一篇: dnf mysql数据库密码,MYSQL