當前位置：首頁 > 编程资源 > 综合教程 >内容正文

综合教程

tsar指标解释

發布時間：2023/12/13 综合教程 31 生活家

生活随笔收集整理的這篇文章主要介紹了 tsar指标解释小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

系統模塊

cpu

字段含義

user: 表示CPU執行用戶進程的時間,通常期望用戶空間CPU越高越好.
sys: 表示CPU在內核運行時間,系統CPU占用率高,表明系統某部分存在瓶頸.通常值越低越好.
wait: CPU在等待I/O操作完成所花費的時間.系統部應該花費大量時間來等待I/O操作,否則就說明I/O存在瓶頸.
hirq: 系統處理硬中斷所花費的時間百分比
sirq: 系統處理軟中斷所花費的時間百分比
util: CPU總使用的時間百分比
nice: 系統調整進程優先級所花費的時間百分比
steal: 被強制等待（involuntary wait）虛擬CPU的時間,此時hypervisor在為另一個虛擬處理器服務
ncpu: CPU的總個數

采集方式

CPU的占用率計算,都是根據/proc/stat計數器文件而來,stat文件的內容基本格式是:

cpu  67793686 1353560 66172807 4167536491 2705057 0 195975 609768
cpu0 10529517 944309 11652564 835725059 2150687 0 74605 196726
cpu1 14380773 127146 13908869 832565666 150815 0 31780 108418

cpu是總的信息,cpu0,cpu1等是各個具體cpu的信息,共有8個值,單位是ticks,分別是

User time, 67793686
Nice time, 1353560
System time, 66172807
Idle time, 4167536491
Waiting time, 2705057
Hard Irq time, 0
SoftIRQ time, 195975
Steal time, 609768

CPU總時間=user+system+nice+idle+iowait+irq+softirq+Stl
各個狀態的占用=狀態的cpu時間％CPU總時間＊100%
比較特殊的是CPU總使用率的計算(util),目前的算法是:
util = 1 - idle - iowait - steal

mem

字段含義

free: 空閑的物理內存的大小
used: 已經使用的內存大小
buff: buff使用的內存大小,buffer is something that has yet to be "written" to disk.
cach: 操作系統會把經常訪問的東西放在cache中加快執行速度,A cache is something that has been "read" from the disk and stored for later use
total: 系統總的內存大小
util: 內存使用率

采集方法

內存的計數器在/proc/meminfo,里面有一些關鍵項

    MemTotal:      7680000 kB
    MemFree:        815652 kB
    Buffers:       1004824 kB
    Cached:        4922556 kB

含義就不解釋了,主要介紹一下內存使用率的計算算法:
util = (total - free - buff - cache) / total * 100%

load

字段含義

load1: 一分鐘的系統平均負載
load5: 五分鐘的系統平均負載
load15:十五分鐘的系統平均負載
runq: 在采樣時刻,運行隊列的任務的數目,與/proc/stat的procs_running表示相同意思
plit: 在采樣時刻,系統中活躍的任務的個數（不包括運行已經結束的任務）

采集方法

/proc/loadavg文件中保存的有負載相關的數據
0.00 0.01 0.00 1/271 23741
分別是1分鐘負載,五分鐘負載,十五分鐘負載,運行進程／總進程最大的pid
只需要采集前五個數據既可得到所有信息
注意:只有當系統負載除cpu核數>1的時候,系統負載較高

traffic

字段含義

bytin: 入口流量byte/s
bytout: 出口流量byte/s
pktin: 入口pkt/s
pktout: 出口pkt/s

采集方法

流量的計數器信息來自:/proc/net/dev

    face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    lo:1291647853895 811582000    0    0    0     0          0         0 1291647853895 811582000    0    0    0     0       0          0
    eth0:853633725380 1122575617    0    0    0     0          0         0 1254282827126 808083790    0    0    0     0       0          0

字段的含義第一行已經標示出來,每一行代表一個網卡,tsar主要采集的是出口和入口的bytes／packets
注意tsar只對以eth和em開頭的網卡數據進行了采集,像lo這種網卡直接就忽略掉了,流量的單位是byte

tcp

字段含義

active:主動打開的tcp連接數目
pasive:被動打開的tcp連接數目
iseg: 收到的tcp報文數目
outseg:發出的tcp報文數目
EstRes:Number of resets that have occurred at ESTABLISHED
AtmpFa:Number of failed connection attempts
CurrEs:當前狀態為ESTABLISHED的tcp連接數
retran:系統的重傳率

采集方法

tcp的相關計數器文件是:/proc/net/snmp

    Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens AttemptFails EstabResets CurrEstab InSegs OutSegs RetransSegs InErrs OutRsts
    Tcp: 1 200 120000 -1 31702170 14416937 935062 772446 16 1846056224 1426620266 448823 0 5387732

我們主要關注其中的ActiveOpens/PassiveOpens/AttemptFails/EstabResets/CurrEstab/InSegs/OutSegs/RetransSegs
主要關注一下重傳率的計算方式:
retran = (RetransSegs－last RetransSegs) ／ (OutSegs－last OutSegs) * 100%

udp

字段含義

idgm: 收到的udp報文數目
odgm: 發送的udp報文數目
noport:udp協議層接收到目的地址或目的端口不存在的數據包
idmerr:udp層接收到的無效數據包的個數

采集方法

UDP的數據來源文件和TCP一樣,也是在/proc/net/snmp

    Udp: InDatagrams NoPorts InErrors OutDatagrams
    Udp: 31609577 10708119 0 159885874

io

字段含義

rrqms: The number of read requests merged per second that were issued to the device.
wrqms: The number of write requests merged per second that were issued to the device.
rs: The number of read requests that were issued to the device per second.
ws: The number of write requests that were issued to the device per second.
rsecs: The number of sectors read from the device per second.
wsecs: The number of sectors written to the device per second.
rqsize:The average size (in sectors) of the requests that were issued to the device.
qusize:The average queue length of the requests that were issued to the device.
await: The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
svctm: The average service time (in milliseconds) for I/O requests that were issued to the device.
util: Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device).Device saturation occurs when this value is close to 100%.

采集方法

IO的計數器文件是:/proc/diskstats,比如:

    202    0 xvda 12645385 1235409 416827071 59607552 193111576 258112651 3679534806 657719704 0 37341324 717325100
    202    1 xvda1 421 2203 3081 9888 155 63 421 1404 0 2608 11292

每一行字段的含義是:

major: 主設備號
minor: 次設備號,設備號是用來區分磁盤的類型和廠家信息
name: 設備名稱
rd_ios: 讀完成次數,number of issued reads. This is the total number of reads completed successfully
rd_merges: 合并讀完成次數,為了效率可能會合并相鄰的讀和寫.從而兩次4K的讀在它最終被處理到磁盤上之前可能會變成一次8K的讀,才被計數（和排隊）,因此只有一次I/O操作
rd_sectors: 讀扇區的次數,number of sectors read. This is the total number of sectors read successfully.
rd_ticks: 讀花費的毫秒數,number of milliseconds spent reading. This is the total number of milliseconds spent by all reads
wr_ios: 寫完成次數,number of writes completed. This is the total number of writes completed successfully
wr_merges: 合并寫完成次數,number of writes merged Reads and writes which are adjacent to each other may be merged for efficiency. Thus two 4K reads may become one 8K read before it is ultimately handed to the disk, and so it will be counted (and queued) as only one I/O.
wr_sectors: 寫扇區次數,number of sectors written. This is the total number of sectors written successfully
wr_ticks: 寫花費的毫秒數,number of milliseconds spent writing. This is the total number of milliseconds spent by all writes.
cur_ios: 正在處理的輸入/輸出請求數,number of I/Os currently in progress. The only field that should go to zero. Incremented as requests are given to appropriate request_queue_t and decremented as they finish.
ticks: 輸入/輸出操作花費的毫秒數
aveq: 輸入/輸出操作花費的加權毫秒數

通過這些計數器可以算出來上面的每個字段的值

double n_ios = rd_ios + wr_ios;
double n_ticks = rd_ticks + wr_ticks;
double n_kbytes = (rd_sectors + wr_sectors) / 2;
st_array[0] = rd_merges / (inter * 1.0);
st_array[1] = wr_merges / (inter * 1.0);
st_array[2] = rd_ios / (inter * 1.0);
st_array[3] = wr_ios / (inter * 1.0);
st_array[4] = rd_sectors / (inter * 2.0);
st_array[5] = wr_sectors / (inter * 2.0);
st_array[6] = n_ios ? n_kbytes / n_ios : 0.0;
st_array[7] = aveq / (inter * 1000);
st_array[8] = n_ios ? n_ticks / n_ios : 0.0;
st_array[9] = n_ios ? ticks / n_ios : 0.0;
st_array[10] = ticks / (inter * 10.0); /* percentage! */
/*st_array分別代表tsar顯示的每一個值*/

注意:

扇區一般都是512字節,因此有的地方除以2了
ws是指真正落到io設備上的寫次數, wrqpms是指系統調用合并的寫次數, 它們之間的大小關系沒有可比性,因為不知道多少請求能夠被合并,比如發起了100個read系統調用,每個讀4K,假如這100個都是連續的讀,由于硬盤通常允許最大的request為256KB,那么block層會把這100個讀請求合并成2個request,一個256KB,另一個144KB,rrqpm/s為100,因為100個request都發生了合并,不管它最后合并成幾個；r/s為2,因為最后的request數為2

paritition

字段含義

bfree: 分區空閑的字節
bused: 分區使用中的字節
btotl: 分區總的大小
util: 分區使用率

采集方法

首先通過/etc/mtab獲取到分區信息,然后通過statfs訪問該分區的信息,查詢文件系統相關信息,包含:

    struct statfs {
    long f_type; /* 文件系統類型 */
    long f_bsiz
    e; /* 經過優化的傳輸塊大小 */
    long f_blocks; /* 文件系統數據塊總數 */
    long f_bfree; /* 可用塊數 */
    long f_bavail; /* 非超級用戶可獲取的塊數 */
    long f_files; /* 文件結點總數 */
    long f_ffree; /* 可用文件結點數 */
    fsid_t f_fsid; /* 文件系統標識 */
    long f_namelen; /* 文件名的最大長度 */
    };

然后就可以計算出tsar需要的信息,分區的字節數＝塊數＊塊大小＝f_blocks * f_bsize

pcsw

字段含義

cswch: 進程切換次數
proc: 新建的進程數

采集方法

計數器在/proc/stat:

    ctxt 19873315174
    processes 296444211

分別代表進程切換次數,以及進程數

tcpx

字段含義

recvq sendq est twait fwait1 fwait2 lisq lising lisove cnest ndrop edrop rdrop pdrop kdrop
分別代表
tcprecvq tcpsendq tcpest tcptimewait tcpfinwait1 tcpfinwait2 tcplistenq tcplistenincq tcplistenover tcpnconnest tcpnconndrop tcpembdrop tcprexmitdrop tcppersistdrop tcpkadrop

采集方法

計數器來自:/proc/net/netstat /proc/net/snmp
里面用到的數據有:

    TcpExt: SyncookiesSent SyncookiesRecv SyncookiesFailed EmbryonicRsts PruneCalled RcvPruned OfoPruned OutOfWindowIcmps LockDroppedIcmps ArpFilter TW TWRecycled TWKilled PAWSPassive PAWSActive PAWSEstab DelayedACKs DelayedACKLocked DelayedACKLost ListenOverflows ListenDrops TCPPrequeued TCPDirectCopyFromBacklog TCPDirectCopyFromPrequeue TCPPrequeueDropped TCPHPHits TCPHPHitsToUser TCPPureAcks TCPHPAcks TCPRenoRecovery TCPSackRecovery TCPSACKReneging TCPFACKReorder TCPSACKReorder TCPRenoReorder TCPTSReorder TCPFullUndo TCPPartialUndo TCPDSACKUndo TCPLossUndo TCPLoss TCPLostRetransmit TCPRenoFailures TCPSackFailures TCPLossFailures TCPFastRetrans TCPForwardRetrans TCPSlowStartRetrans TCPTimeouts TCPRenoRecoveryFail TCPSackRecoveryFail TCPSchedulerFailed TCPRcvCollapsed TCPDSACKOldSent TCPDSACKOfoSent TCPDSACKRecv TCPDSACKOfoRecv TCPAbortOnSyn TCPAbortOnData TCPAbortOnClose TCPAbortOnMemory TCPAbortOnTimeout TCPAbortOnLinger TCPAbortFailed TCPMemoryPressures
    TcpExt: 0 0 0 80 539 0 0 0 0 0 3733709 51268 0 0 0 80 5583301 5966 104803 146887 146887 6500405 39465075 2562794034 0 689613557 2730596 540646233 234702206 0 44187 2066 94 240 0 114 293 1781 7221 60514 185158 2 2 3403 400 107505 5860 24813 174014 0 2966 7 168787 106151 40 32851 2 0 2180 9862 0 15999 0 0 0

具體字段找到并且獲取即可

percpu ncpu

字段含義

字段含義等同cpu模塊,只不過能夠支持采集具體的每一個cpu的信息

采集方法

等同于cpu模塊

pernic

字段含義

字段含義等同traffic模塊,只不過能夠支持采集具體的每一個網卡的信息

采集方法

等同于traffic模塊

應用模塊

proc

字段含義

user: 某個進程用戶態cpu消耗
sys: 某個進程系統態cpu消耗
total:某個進程總的cpu消耗
mem: 某個進程的內存消耗百分比
RSS: 某個進程的虛擬內存消耗,這是駐留在物理內存的一部分.它沒有交換到硬盤.它包括代碼,數據和棧
read: 進程io讀字節
write:進程的io寫字節

采集方法

計數器文件

/proc/pid/stat:獲取進程的cpu信息
/proc/pid/status:獲取進程的mem信息
/proc/pid/io:獲取進程的讀寫IO信息

注意,需要將采集的進程名稱配置在/etc/tsar/tsar.conf總的mod_proc on procname,這樣就會找到procname的pid,并進行數據采集

nginx

字段含義

Accept:總共接收的新連接數目
Handle:總共處理的連接數目
Reqs:總共產生請求數目
Active:活躍的連接數,等于read+write+wait
Read:讀取請求數據的連接數目
Write:向用戶寫響應數據的連接數目
Wait:長連接等待的連接數目
Qps:每秒處理的請求數
Rt:平均響應時間ms

采集方法

通過nginx的采集模塊配置,訪問特定地址,具體參見:https://github.com/taobao/tsar-mod_nginx

    location = /nginx_status {
        stub_status on;
    }

請求到的數據是:

    Active connections: 1
    server accepts handled requests request_time
    24 24 7 0
    Reading: 0 Writing: 1 Waiting: 0

需要確保nginx配置該location,并且能夠訪問curl http://localhost/nginx_status得到上面的數據
如果nginx的端口不是80,則需要在配置文件中指定端口,配置文件是/etc/tsar/tsar.conf,修改mod_nginx on為mod_nginx on 8080

類似的有nginx_code, nginx_domain模塊,相應的配置是:

    req_status_zone server "$host" 20M;
    req_status server;
    location /traffic_status {
            req_status_show;
    }

通過訪問curl http://localhost/traffic_status能夠得到如下字段的數據
localhost,0,0,2,2,2,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0

請求到的數據每個字段的含義是:

kv 計算得到的req_status_zone指令定義變量的值,此時為domain字段
bytes_in_total 從客戶端接收流量總和
bytes_out_total 發送到客戶端流量總和
conn_total 處理過的連接總數
req_total 處理過的總請求數
2xx 2xx請求的總數
3xx 3xx請求的總數
4xx 4xx請求的總數
5xx 5xx請求的總數
other 其他請求的總數
rt_total rt的總數
upstream_req 需要訪問upstream的請求總數
upstream_rt 訪問upstream的總rt
upstream_tries upstram總訪問次數
200 200請求的總數
206 206請求的總數
302 302請求的總數
304 304請求的總數
403 403請求的總數
404 404請求的總數
416 416請求的總數
499 499請求的總數
500 500請求的總數
502 502請求的總數
503 503請求的總數
504 504請求的總數
508 508請求的總數
detail_other 非以上13種status code的請求總數

如果domain數量太多,或者端口不是80,需要進行專門的配置,配置文件內容如下:
port=8080 #指定nginx的端口
top=10 #指定最多采集的域名個數，按照請求總個數排列
domain=a.com b.com #指定特定需要采集的域名列表,分隔符為空格,逗號,或者制表符
在/etc/tsar/tsar.conf中指定配置文件的路徑:mod_nginx_domain on /tmp/my.conf

nginx_domain_traffic

nginx配置是:

    req_status_zone server "$host" 20M;
    req_status server;

    # req_status_zone_add_indecator 指令：可以在req status輸出的每一行最后添加新的字段
    # 這里添加的字段用于統計nginx的變量: $2xx_bytes_sent, $3xx_bytes_sent, $4xx_bytes_sent, $5xx_bytes_sent
    # $2xx_bytes_sent: 請求返回2xx時候，發送給客戶端的數據量(如果請求非2xx則該變量為0)
    req_status_zone_add_indecator server $2xx_bytes_sent $3xx_bytes_sent $4xx_bytes_sent $5xx_bytes_sent;

    location /traffic_status {
            req_status_show;
    }

輸出實例:

   module004033.sqa.cm4 tsar $ tsar --nginx_domain_traffic -li1
   Time              -----------------localhost:8080----------------- ----------------www.foo.com:8080----------------
   Time               bytin  bytout  2XXout  3XXout  4XXout  5XXout    bytin  bytout  2XXout  3XXout  4XXout  5XXout
   09/01/15-13:45:48   0.00    0.00    0.00    0.00    0.00    0.00   410.1K   16.6M   16.6M    0.00    0.00    0.00
   09/01/15-13:45:49   0.00    0.00    0.00    0.00    0.00    0.00   407.8K   16.5M   16.5M    0.00    0.00    0.00
   09/01/15-13:45:51 159.0K  287.4K    0.00    0.00    0.00  287.4K   258.6K   10.5M   10.5M    0.00    0.00    0.00
   09/01/15-13:45:52 245.5K  443.5K    0.00    0.00    0.00  443.5K   224.2K    9.1M    9.1M    0.00    0.00    0.00

字段含義:

bytin: 收到的請求字節數byte/s
bytout: 輸出的應答字節數byte/s
2XXout: 輸出的2XX應答字節數byte/s
3XXout: 輸出的3XX應答字節數byte/s
4XXout: 輸出的4XX應答字節數byte/s
5XXout: 輸出的5XX應答字節數byte/s

nginx_ups

用于輸出nginx upstream想關信息
nginx配置是:

    req_status_zone server "$host" 20M;
    req_status server;
    req_status_zone_add_indecator server $response_fbt_time $upstream_response_fbt_time $upstream_response_length;

    location /traffic_status {
            req_status_show;
    }

輸出實例:

     module004033.sqa.cm4 tsar $ tsar --nginx_ups -li1
     Time              ----------------------------nginx_ups---------------------------
     Time               traff     qps     4XX     5XX    rqps      rt     fbt    ufbt
     09/01/15-16:26:29  15.8M    3.9K    3.9K    0.00    0.00    9.7K    9.7K    9.7K
     09/01/15-16:26:30  15.8M    3.9K    3.9K    0.00    0.00    9.7K    9.7K    9.7K
     09/01/15-16:26:31   4.9M    1.2K    1.2K    0.00    0.00    3.0K    3.0K    3.0K

字段含義:

traff: 后端返回的應答body的流量(不包括http應答頭部)
qps: 后端qps
rqps: 后端總qps(包含重試的qps + 后端qps)
4XX: 后端返回4XX狀態碼的qps
5XX: 后端返回5XX狀態碼的qps
rt: 后端應答時間
fbt: tengine首字節時間
ufbt: 后端應答首字節時間

squid

字段含義

qps: 每秒請求數
rt: 訪問平均相應時間
r_hit: 請求命中率
b_hit: 字節命中率
d_hit: 磁盤命中率
m_hit: 內存命中率
fdused: Number of file desc currently in use
fdque: Files queued for open
objs: StoreEntries
inmem: StoreEntries with MemObjects
hot: Hot Object Cache Items
size: Mean Object Size

采集方法

訪問squid的mgrinfo信息獲取,有些字段經過了一些patch,可能不適用外部版本

haproxy

字段含義

stat: 狀態,1正常
uptime:啟動持續時間
conns: 總的連接數
qps: 每秒請求數
hit: haproxy開啟cache時的命中率
rt: 平均響應時間ms

采集方法

haproxy經過了patch,能夠在多進程模式下進行統計信息的匯總,然后通過haproxy的本地訪問其狀態頁面admin分析得到

lvs

字段含義

stat: lvs狀態,1正常
conns: 總的連接數
pktin: 收到的包數
pktout:發出的包數
bytin: 收到的字節數
bytout:發出的字節數

采集方法

訪問lvs的統計文件:/proc/net/ip_vs_stats

apache

參見:https://github.com/kongjian/tsar-apache

tcprt

私有應用,略

swift

私有應用,略

cgcpu/cgmem/cgblkio

私有應用,略

trafficserver

待補充

tmd

私有應用,略

總結

以上是生活随笔為你收集整理的tsar指标解释的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

指标
Tsar

上一篇： java中组合_java中组合模式详解和
下一篇： MySQL 使用AVG聚合函数时，保留两