當(dāng)前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

mysql用户阻塞数_MySQL实例阻塞分析一例(线程statistics状态)

發(fā)布時(shí)間：2024/7/23 数据库 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 mysql用户阻塞数_MySQL实例阻塞分析一例(线程statistics状态) 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

本文用實(shí)例來分析MySQL阻塞—線程statistics狀態(tài)。

一、現(xiàn)象

某日下午下班后低峰期，現(xiàn)網(wǎng)MySQL一個(gè)庫突然報(bào)出大量慢sql，狀態(tài)是?statistics，但是過后拿這些sql去執(zhí)行的時(shí)候，實(shí)際很快。處于 statistics 狀態(tài)的線程有個(gè)特征：查詢的都是視圖，但看監(jiān)控那個(gè)時(shí)間段并沒有明顯的update/detele/insert。

通過我們的快照程序，去分析當(dāng)時(shí)的 innodb status，發(fā)現(xiàn)如下信息：

SEMAPHORES

----------

OS WAIT ARRAY INFO: reservation count 17208994

--Thread 139964610234112 has waited at srv0srv.cc line 2132 for 14.00 seconds the semaphore:

X-lock (wait_ex) on RW-latch at 0x1635a00 created in file dict0dict.cc line 900

a writer (thread id 139964610234112) has reserved it in mode wait exclusive

number of readers 1, waiters flag 0, lock_word: ffffffffffffffff

Last time read locked in file row0purge.cc line 720

Last time write locked in file /home/admin/146_20161018140650857_13830810_code/rpm_workspace/storage/innobase/srv/srv0srv.cc line 2132

OS WAIT ARRAY INFO: signal count 256984450

Mutex spin waits 626367674, rounds 2776951802, OS waits 1973672

RW-shared spins 149944457, rounds 1650148561, OS waits 3972058

RW-excl spins 72090467, rounds 2017802579, OS waits 11148264

Spin rounds per wait: 4.43 mutex, 11.01 RW-shared, 27.99 RW-excl

...

FILE I/O

--------

I/O thread 0 state: waiting for i/o request (insert buffer thread)

I/O thread 1 state: waiting for i/o request (log thread)

I/O thread 2 state: waiting for i/o request (read thread)

I/O thread 3 state: doing file i/o (read thread) ev set

I/O thread 4 state: waiting for i/o request (read thread)

I/O thread 5 state: doing file i/o (read thread) ev set

I/O thread 6 state: doing file i/o (write thread) ev set

I/O thread 7 state: waiting for i/o request (write thread)

I/O thread 8 state: waiting for i/o request (write thread)

I/O thread 9 state: waiting for i/o request (write thread)

Pending normal aio reads: 18 [0, 12, 0, 6] , aio writes: 1 [1, 0, 0, 0] ,

ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0

Pending flushes (fsync) log: 0; buffer pool: 0

1346747614 OS file reads, 2869418806 OS file writes, 524616747 OS fsyncs

22 pending preads, 1 pending pwrites

6.00 reads/s, 16384 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s

...

ROW OPERATIONS

--------------

0 queries inside InnoDB, 0 queries in queue

38 read views open inside InnoDB

Main thread process no. 34414, id 139964610234112, state: enforcing dict cache limit

Number of rows inserted 2546811699, updated 1708150459, deleted 1004154696, read 413168628410

0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 54.19 reads/s

二、分析

從上面的信息知道 Thread 139964610234112 是主線程，在源碼 srv0srv.cc:2132 行的地方等待信號(hào)14s，這個(gè)信號(hào)是在 dict0dict.cc:900 地方創(chuàng)建的 RW-latch 排它鎖。那么奇怪了，主線程自己在等待自己的互斥鎖。

由于環(huán)境是阿里云的RDS(基于MySQL 5.6.16-log 版本)，拿不到他們的代碼，找來 5.6.35 的來看，行號(hào)對(duì)不上。但好在上段信息的最后面有一個(gè) Main thread state:?enforcing dict cache limit，發(fā)現(xiàn)在 srv0srv.cc 函數(shù) srv_master_do_active_tasks() 約2137行的位置：

if (cur_time % SRV_MASTER_DICT_LRU_INTERVAL == 0) {

srv_main_thread_op_info = "enforcing dict cache limit";

srv_master_evict_from_table_cache(50);

MONITOR_INC_TIME_IN_MICRO_SECS(

MONITOR_SRV_DICT_LRU_MICROSECOND, counter_time);

}

應(yīng)該是在調(diào)用 srv_master_evict_from_table_cache() 從innodb table cache里面清理緩存的地方waiting(這里不是一定會(huì)清理，而是先判斷空間夠不夠用，參數(shù)50表示只掃描 unused_table list的50%)。srv_master_evict_from_table_cache()：

srv_master_evict_from_table_cache(

/*==============================*/

ulint pct_check) /*!< in: max percent to check */

{

ulint n_tables_evicted = 0;

rw_lock_x_lock(&dict_operation_lock);

dict_mutex_enter_for_mysql();

n_tables_evicted = dict_make_room_in_cache( /** 在dict0dict.cc里面 **/

innobase_get_table_cache_size(), pct_check);

dict_mutex_exit_for_mysql();

rw_lock_x_unlock(&dict_operation_lock);

return(n_tables_evicted);

}

就是在?rw_lock_x_lock(&dict_operation_lock)?這個(gè)地方獲取Latch的時(shí)候等待了14s，這個(gè)鎖就是在數(shù)據(jù)字典模塊 dict0dict.cc:dict_init() 約1065行的地方創(chuàng)建的，與innodb status輸出基本一致。

關(guān)于?dict_operation_lock?直接看注釋吧：

/** @brief the data dictionary rw-latch protecting dict_sys

table create, drop, etc. reserve this in X-mode;

implicit or backround operations purge, rollback, foreign key checks reserve this in S-mode;

we cannot trust that MySQL protects implicit or background operations a table drop since MySQL does not know of them;

therefore we need this; NOTE: a transaction which reserves this must keep book on the mode in trx_t::dict_operation_lock_mode */

在嘗試把表定義逐出緩存時(shí)，獲取的是 dict_operation_lock X-mode lock，可是從已有的信息里看不到另一個(gè)數(shù)據(jù)字典鎖是什么。之前是懷疑是不是 table_definition_cache, table_open_cache, innodb_open_files 設(shè)置小了，視圖一般是多表join，更容易消耗打開表的數(shù)量，導(dǎo)致不斷的逐出cache而導(dǎo)致鎖爭(zhēng)用。但是檢查一番并沒發(fā)現(xiàn)什么問題，更何況是14s的等待。關(guān)于它們的設(shè)置和關(guān)系，可以參考我的文章 table_open_cache 與 table_definition_cache 對(duì)MySQL的影響(詳見文末參考文獻(xiàn))。

那么得換個(gè)思路了，processlist里面有13個(gè)長(zhǎng)時(shí)間處于 statistics 狀態(tài)的線程，表示正在計(jì)算統(tǒng)計(jì)數(shù)據(jù)，以制定一個(gè)查詢執(zhí)行計(jì)劃。如果一個(gè)線程很長(zhǎng)一段時(shí)間處于這種狀態(tài)，可能是磁盤IO性能很差，或者磁盤在執(zhí)行其他工作。

此時(shí)注意到最上面的信息里有?Pending normal aio reads: 18 [0, 12, 0, 6]?，有18個(gè)讀IO被掛起(實(shí)際從監(jiān)控圖 innodb_data_pending_reads看來，有達(dá)到過50)，四個(gè)read thread有三個(gè)處于忙碌狀態(tài)。再有 innodb_buffer_pool_pages_flushed 在出異常前10s沒有任何變化，也就是沒有成功的將臟數(shù)據(jù)刷盤動(dòng)作。另外這是一個(gè)從庫，出異常前10s有出現(xiàn)過瞬間20多秒延遲：

(這一切關(guān)注的都是 18:59:05 之前的數(shù)據(jù)，之后的時(shí)間，一般恢復(fù)了都會(huì)有瞬間的讀行數(shù)上漲，這個(gè)時(shí)候別把它們反當(dāng)做起因)

三、結(jié)論

結(jié)合上面的 enforcing dict cache limit 和 statistics IO pending，找到兩個(gè)有關(guān)的bug report:

https://bugs.launchpad.net/percona-server/+bug/1500176

https://bugs.mysql.com/bug.php?id=84424

第一個(gè)是使用 pt-online-schema-change 去更改分區(qū)表的結(jié)構(gòu)，可能會(huì)出現(xiàn)，但目前bug狀態(tài)是Undecided，我們的環(huán)境沒有分區(qū)表，沒外鍵，也沒有改表動(dòng)作。第二個(gè)其實(shí) Not a bug：

Thank you for your bug report. This is, however, not a bug, but a very well known issue.

You have to do several things in order to alleviate the problem:

* increase the additional memory pool

(注：這里我認(rèn)為不應(yīng)該是additional memory pool，而是 buffer_pool，因?yàn)楝F(xiàn)在innodb內(nèi)存管理基本是調(diào)用系統(tǒng)malloc，

即innodb_use_sys_malloc=ON，參考https://dev.mysql.com/doc/refman/5.7/en/innodb-performance-use_sys_malloc.html)

* increase total number of file handles available to MySQL

* increase number of file handles for InnoDB

* improve performance of the I/O on your operating system

說到底就是數(shù)據(jù)庫服務(wù)器IO遇到問題了，可以通過增加 buffer_pool 來緩存更多的數(shù)據(jù)，或者提高服務(wù)器IO能力，這個(gè)范圍就廣了，可參考《8.5.8?Optimizing InnoDB Disk I/O》(詳見文末參考文獻(xiàn))。? 然而生產(chǎn)服務(wù)器都運(yùn)行了1年之久，高峰期都沒出現(xiàn)過IO問題，現(xiàn)在何況低峰期，也沒有人為操作。那這個(gè)鍋只能交給阿里RDS了：懷疑是實(shí)例所在物理機(jī)磁盤有抖動(dòng)。

分析這么久得出這個(gè)結(jié)論，卻不能做什么，因?yàn)槲覀儧]辦法看到服務(wù)器級(jí)別的IO stats。其實(shí)想到去年也有實(shí)例出現(xiàn)過幾例類似 statistics 問題，向阿里云提工單確認(rèn)物理機(jī)狀態(tài)，得到的結(jié)論都是：“是的，物理機(jī)有抖動(dòng)。需要申請(qǐng)遷移實(shí)例嗎”，但是從來拿不到依據(jù)。如果自己能看到OS級(jí)別的監(jiān)控，其實(shí)都不需要本文這么冗長(zhǎng)的分析。

原文發(fā)布時(shí)間為：2017-10-25

本文作者：周曉，知數(shù)堂第8期學(xué)員

本文來自云棲社區(qū)合作伙伴“老葉茶館”，了解相關(guān)信息可以關(guān)注“老葉茶館”微信公眾號(hào)

創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎(jiǎng)勵(lì)來咯，堅(jiān)持創(chuàng)作打卡瓜分現(xiàn)金大獎(jiǎng)

總結(jié)

以上是生活随笔為你收集整理的mysql用户阻塞数_MySQL实例阻塞分析一例(线程statistics状态)的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： java 文件夹存在文件_Java判断是
下一篇： spark-sql建表语句限制_Spar