mysql用户阻塞数_MySQL实例阻塞分析一例(线程statistics状态)
本文用實(shí)例來分析MySQL阻塞—線程statistics狀態(tài)。
一、 現(xiàn)象
某日下午下班后低峰期,現(xiàn)網(wǎng)MySQL一個(gè)庫突然報(bào)出大量慢sql,狀態(tài)是?statistics,但是過后拿這些sql去執(zhí)行的時(shí)候,實(shí)際很快。處于 statistics 狀態(tài)的線程有個(gè)特征:查詢的都是視圖,但看監(jiān)控那個(gè)時(shí)間段并沒有明顯的update/detele/insert。
通過我們的快照程序,去分析當(dāng)時(shí)的 innodb status,發(fā)現(xiàn)如下信息:
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 17208994
--Thread 139964610234112 has waited at srv0srv.cc line 2132 for 14.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x1635a00 created in file dict0dict.cc line 900
a writer (thread id 139964610234112) has reserved it in mode wait exclusive
number of readers 1, waiters flag 0, lock_word: ffffffffffffffff
Last time read locked in file row0purge.cc line 720
Last time write locked in file /home/admin/146_20161018140650857_13830810_code/rpm_workspace/storage/innobase/srv/srv0srv.cc line 2132
OS WAIT ARRAY INFO: signal count 256984450
Mutex spin waits 626367674, rounds 2776951802, OS waits 1973672
RW-shared spins 149944457, rounds 1650148561, OS waits 3972058
RW-excl spins 72090467, rounds 2017802579, OS waits 11148264
Spin rounds per wait: 4.43 mutex, 11.01 RW-shared, 27.99 RW-excl
...
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: doing file i/o (read thread) ev set
I/O thread 4 state: waiting for i/o request (read thread)
I/O thread 5 state: doing file i/o (read thread) ev set
I/O thread 6 state: doing file i/o (write thread) ev set
I/O thread 7 state: waiting for i/o request (write thread)
I/O thread 8 state: waiting for i/o request (write thread)
I/O thread 9 state: waiting for i/o request (write thread)
Pending normal aio reads: 18 [0, 12, 0, 6] , aio writes: 1 [1, 0, 0, 0] ,
ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
1346747614 OS file reads, 2869418806 OS file writes, 524616747 OS fsyncs
22 pending preads, 1 pending pwrites
6.00 reads/s, 16384 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
...
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
38 read views open inside InnoDB
Main thread process no. 34414, id 139964610234112, state: enforcing dict cache limit
Number of rows inserted 2546811699, updated 1708150459, deleted 1004154696, read 413168628410
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 54.19 reads/s
二、 分析
從上面的信息知道 Thread 139964610234112 是主線程,在源碼 srv0srv.cc:2132 行的地方等待信號(hào)14s,這個(gè)信號(hào)是在 dict0dict.cc:900 地方創(chuàng)建的 RW-latch 排它鎖。那么奇怪了,主線程自己在等待自己的互斥鎖。
由于環(huán)境是阿里云的RDS(基于MySQL 5.6.16-log 版本),拿不到他們的代碼,找來 5.6.35 的來看,行號(hào)對(duì)不上。但好在上段信息的最后面有一個(gè) Main thread state:?enforcing dict cache limit,發(fā)現(xiàn)在 srv0srv.cc 函數(shù) srv_master_do_active_tasks() 約2137行的位置:
if (cur_time % SRV_MASTER_DICT_LRU_INTERVAL == 0) {
srv_main_thread_op_info = "enforcing dict cache limit";
srv_master_evict_from_table_cache(50);
MONITOR_INC_TIME_IN_MICRO_SECS(
MONITOR_SRV_DICT_LRU_MICROSECOND, counter_time);
}
應(yīng)該是在調(diào)用 srv_master_evict_from_table_cache() 從innodb table cache里面清理緩存的地方waiting(這里不是一定會(huì)清理,而是先判斷空間夠不夠用,參數(shù)50表示只掃描 unused_table list的50%)。srv_master_evict_from_table_cache():
srv_master_evict_from_table_cache(
/*==============================*/
ulint pct_check) /*!< in: max percent to check */
{
ulint n_tables_evicted = 0;
rw_lock_x_lock(&dict_operation_lock);
dict_mutex_enter_for_mysql();
n_tables_evicted = dict_make_room_in_cache( /** 在dict0dict.cc里面 **/
innobase_get_table_cache_size(), pct_check);
dict_mutex_exit_for_mysql();
rw_lock_x_unlock(&dict_operation_lock);
return(n_tables_evicted);
}
就是在?rw_lock_x_lock(&dict_operation_lock)?這個(gè)地方獲取Latch的時(shí)候等待了14s,這個(gè)鎖就是在數(shù)據(jù)字典模塊 dict0dict.cc:dict_init() 約1065行的地方創(chuàng)建的,與innodb status輸出基本一致。
關(guān)于?dict_operation_lock?直接看注釋吧:
/** @brief the data dictionary rw-latch protecting dict_sys
table create, drop, etc. reserve this in X-mode;
implicit or backround operations purge, rollback, foreign key checks reserve this in S-mode;
we cannot trust that MySQL protects implicit or background operations a table drop since MySQL does not know of them;
therefore we need this; NOTE: a transaction which reserves this must keep book on the mode in trx_t::dict_operation_lock_mode */
在嘗試把表定義逐出緩存時(shí),獲取的是 dict_operation_lock X-mode lock,可是從已有的信息里看不到另一個(gè)數(shù)據(jù)字典鎖是什么。 之前是懷疑是不是 table_definition_cache, table_open_cache, innodb_open_files 設(shè)置小了,視圖一般是多表join,更容易消耗打開表的數(shù)量,導(dǎo)致不斷的逐出cache而導(dǎo)致鎖爭(zhēng)用。但是檢查一番并沒發(fā)現(xiàn)什么問題,更何況是14s的等待。關(guān)于它們的設(shè)置和關(guān)系,可以參考我的文章 table_open_cache 與 table_definition_cache 對(duì)MySQL的影響(詳見文末參考文獻(xiàn))。
那么得換個(gè)思路了,processlist里面有13個(gè)長(zhǎng)時(shí)間處于 statistics 狀態(tài)的線程,表示正在計(jì)算統(tǒng)計(jì)數(shù)據(jù),以制定一個(gè)查詢執(zhí)行計(jì)劃。 如果一個(gè)線程很長(zhǎng)一段時(shí)間處于這種狀態(tài),可能是磁盤IO性能很差,或者磁盤在執(zhí)行其他工作。
此時(shí)注意到最上面的信息里有?Pending normal aio reads: 18 [0, 12, 0, 6]?,有18個(gè)讀IO被掛起(實(shí)際從監(jiān)控圖 innodb_data_pending_reads看來,有達(dá)到過50),四個(gè)read thread有三個(gè)處于忙碌狀態(tài)。再有 innodb_buffer_pool_pages_flushed 在出異常前10s沒有任何變化,也就是沒有成功的將臟數(shù)據(jù)刷盤動(dòng)作。另外這是一個(gè)從庫,出異常前10s有出現(xiàn)過瞬間20多秒延遲:
(這一切關(guān)注的都是 18:59:05 之前的數(shù)據(jù),之后的時(shí)間,一般恢復(fù)了都會(huì)有瞬間的讀行數(shù)上漲,這個(gè)時(shí)候別把它們反當(dāng)做起因)
三、結(jié)論
結(jié)合上面的 enforcing dict cache limit 和 statistics IO pending,找到兩個(gè)有關(guān)的bug report:
https://bugs.launchpad.net/percona-server/+bug/1500176
https://bugs.mysql.com/bug.php?id=84424
第一個(gè)是使用 pt-online-schema-change 去更改分區(qū)表的結(jié)構(gòu),可能會(huì)出現(xiàn),但目前bug狀態(tài)是Undecided,我們的環(huán)境沒有分區(qū)表,沒外鍵,也沒有改表動(dòng)作。 第二個(gè)其實(shí) Not a bug:
Thank you for your bug report. This is, however, not a bug, but a very well known issue.
You have to do several things in order to alleviate the problem:
* increase the additional memory pool
(注:這里我認(rèn)為不應(yīng)該是additional memory pool,而是 buffer_pool,因?yàn)楝F(xiàn)在innodb內(nèi)存管理基本是調(diào)用系統(tǒng)malloc,
即innodb_use_sys_malloc=ON,參考https://dev.mysql.com/doc/refman/5.7/en/innodb-performance-use_sys_malloc.html)
* increase total number of file handles available to MySQL
* increase number of file handles for InnoDB
* improve performance of the I/O on your operating system
說到底就是數(shù)據(jù)庫服務(wù)器IO遇到問題了,可以通過增加 buffer_pool 來緩存更多的數(shù)據(jù),或者提高服務(wù)器IO能力,這個(gè)范圍就廣了,可參考《8.5.8?Optimizing InnoDB Disk I/O》(詳見文末參考文獻(xiàn))。? 然而生產(chǎn)服務(wù)器都運(yùn)行了1年之久,高峰期都沒出現(xiàn)過IO問題,現(xiàn)在何況低峰期,也沒有人為操作。那這個(gè)鍋只能交給阿里RDS了:懷疑是實(shí)例所在物理機(jī)磁盤有抖動(dòng)。
分析這么久得出這個(gè)結(jié)論,卻不能做什么,因?yàn)槲覀儧]辦法看到服務(wù)器級(jí)別的IO stats。其實(shí)想到去年也有實(shí)例出現(xiàn)過幾例類似 statistics 問題,向阿里云提工單確認(rèn)物理機(jī)狀態(tài),得到的結(jié)論都是:“是的,物理機(jī)有抖動(dòng)。需要申請(qǐng)遷移實(shí)例嗎”,但是從來拿不到依據(jù)。如果自己能看到OS級(jí)別的監(jiān)控,其實(shí)都不需要本文這么冗長(zhǎng)的分析。
原文發(fā)布時(shí)間為:2017-10-25
本文作者:周曉,知數(shù)堂第8期學(xué)員
本文來自云棲社區(qū)合作伙伴“老葉茶館”,了解相關(guān)信息可以關(guān)注“老葉茶館”微信公眾號(hào)
創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎(jiǎng)勵(lì)來咯,堅(jiān)持創(chuàng)作打卡瓜分現(xiàn)金大獎(jiǎng)總結(jié)
以上是生活随笔為你收集整理的mysql用户阻塞数_MySQL实例阻塞分析一例(线程statistics状态)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: java 文件夹存在文件_Java判断是
- 下一篇: spark-sql建表语句限制_Spar