Linux Mutex机制与死锁分析
在Linux系統(tǒng)上,Mutex機制相比于信號量,實現(xiàn)更加簡單和高效,但使用也更加嚴格
1. 任何時刻只有一個任務(wù)可以持有Mutex
2. 誰上鎖誰解鎖
3. 不允許遞歸地上鎖和解鎖
4. 當進程持有一個Mutex時,不允許退出
5. Mutex只能通過相關(guān)API來管理,不可被拷貝,手動初始化或重復(fù)初始化
在應(yīng)用層來說,一般Mutex多用于多線程間的同步,本文針對第四點"當進程持有一個Mutex時,不允許退出"來做一些探討和測試
關(guān)于多進程使用Mutex,有一個很經(jīng)典的場景,即共享內(nèi)存通訊
兩個進程使用共享內(nèi)存進行通訊時,一般都需要用到mutex來進行數(shù)據(jù)保護
而使用到鎖,必然會有死鎖的情況發(fā)生,下面將以多進程Mutex機制來分析死鎖的情況
首先來看一段代碼, 主要為mutex多進程的使用和模擬死鎖場景
#include <stdio.h> #include <pthread.h> #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/shm.h> #include <sys/wait.h> #include <string.h>//測試要點: //正在持有mutex的進程, 不能退出!int main() {pid_t pid;int shmid;int* shmptr;int* tmp;int err;pthread_mutexattr_t mattr;//只能使用mutex相關(guān)API來進行初始化if((err = pthread_mutexattr_init(&mattr)) < 0){printf("mutex addr init error:%s\n", strerror(err));exit(1);}//針對進程同步,使用屬性PTHREAD_PROCESS_SHARED,默認屬性是同步線程的if((err = pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED)) < 0){printf("mutex addr get shared error:%s\n", strerror(err));exit(1);}//注意:這里是個大坑,這里的mutex必須是用共享內(nèi)存的方式創(chuàng)建,目的是父進程和子進程可以共用此mutex。//否則,父進程的mutex就是父進程的,子進程的mutex就是子進程的,不能達到同步的作用。pthread_mutex_t* m;int mid = shmget(IPC_PRIVATE, sizeof(pthread_mutex_t), 0600);m = (pthread_mutex_t*)shmat(mid, NULL, 0);//只能使用mutex相關(guān)API來進行初始化if((err = pthread_mutex_init(m, &mattr)) < 0){printf("mutex mutex init error:%s\n", strerror(err));exit(1);}//創(chuàng)建一個共享內(nèi)存區(qū)域,讓父進程和子進程往里寫數(shù)據(jù)。if((shmid = shmget(IPC_PRIVATE, 1000, IPC_CREAT | 0600)) < 0){perror("shmget error");exit(1);}//取得指向共享內(nèi)存的指針if((shmptr = shmat(shmid, 0, 0)) == (void*)-1){perror("shmat error");exit(1);}tmp = shmptr;//創(chuàng)建一個共享內(nèi)存,保存上面共享內(nèi)存的指針int shmid2;int** shmptr2;if((shmid2 = shmget(IPC_PRIVATE, 20, IPC_CREAT | 0600)) < 0){perror("shmget2 error");exit(1);}//取得指向共享內(nèi)存的指針if((shmptr2 = shmat(shmid2, 0, 0)) == (void*)-1){perror("shmat2 error");exit(1);}//讓shmptr2指向共享內(nèi)存id為shmid的首地址。*shmptr2 = shmptr;if((pid = fork()) < 0){perror("fork error");exit(1);}else if(pid == 0){//子進程//從此處開始給mutex加鎖,如果加鎖成功,則此期間,父進程無法取得鎖if((err = pthread_mutex_lock(m)) < 0){printf("lock error:%s\n", strerror(err));exit(1);}for(int i = 0; i < 30; ++i){**shmptr2 = i;(*shmptr2)++;}//模擬死鎖場景//exit(1); //持有鎖的期間退出;if((err = pthread_mutex_unlock(m)) < 0){printf("unlock error:%s\n", strerror(err));exit(1);}exit(0);}else{sleep(1);//等待一會兒,讓子進程先運行//從此處開始給mutex加鎖,如果加鎖成功,則此期間,子進程無法取得鎖if((err = pthread_mutex_lock(m)) < 0){printf("lock error:%s\n", strerror(err));exit(1);}for(int i = 40; i < 70; ++i){**shmptr2 = i;(*shmptr2)++;}if((err = pthread_mutex_unlock(m)) < 0){printf("unlock error:%s\n", strerror(err));exit(1);}}//給子進程收尸,防止僵尸進程wait(NULL);//查看共享內(nèi)存的值for(int i = 0; i < 70; ++i){printf("%d ", tmp[i]);}printf("\n");//銷毀mutex的屬性pthread_mutexattr_destroy(&mattr);//銷毀mutexpthread_mutex_destroy(m);exit(0); }程序正常運行結(jié)果為:
#$ ./a.out
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 0 0 0 0 0 0 0 0 0 0
有了mutex的保護,數(shù)據(jù)不會發(fā)生錯亂; 子進程先運行,寫入連續(xù)的0-29; 父進程后運行,寫入連續(xù)的40-69;
下面模擬死鎖場景,在子進程加鎖后退出;打開103行如下代碼
//模擬死鎖場景exit(1); //持有鎖的期間退出;運行之后,程序卡住,沒任何輸出;即進入了死鎖狀態(tài);
針對死鎖情況,下面介紹常用的幾種分析工具
首先想到的就是gdb,在有源碼且可編譯的情況下,使用gdb比較直接
#$ gdb ./a.out
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.? Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./a.out...done.
(gdb) r
Starting program: /media/gwind/windcode/self-code-snippet/misc/utils-core/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
^C???? (程序運行到此處卡主, 按CTRL/C退出)
Program received signal SIGINT, Interrupt.
__lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135?? ?../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) bt??? (打印調(diào)用棧)
#0? __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1? 0x00007ffff7bc3dbd in __GI___pthread_mutex_lock (mutex=0x7ffff7ff6000) at ../nptl/pthread_mutex_lock.c:80
#2? 0x0000000000400d78 in main () at mutex-multi-process.c:119
(gdb)
?可以看到死鎖發(fā)生在mutex-multi-process.c:119行 __lll_lock_wait 一直等待鎖
即子進程獲取鎖退出后,父進程會加鎖不成功,一直等待鎖
方式二,使用strace來跟蹤系統(tǒng)調(diào)用狀態(tài)
#$ strace ./a.out
execve("./a.out", ["./a.out"], [/* 78 vars */]) = 0
brk(NULL)?????????????????????????????? = 0x22a5000
access("/etc/ld.so.nohwcap", F_OK)????? = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)????? = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=188922, ...}) = 0
mmap(NULL, 188922, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9489b9e000
close(3)??????????????????????????????? = 0
access("/etc/ld.so.nohwcap", F_OK)????? = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260`\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=138696, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9489b9d000
mmap(NULL, 2212904, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f948978b000
mprotect(0x7f94897a3000, 2093056, PROT_NONE) = 0
mmap(0x7f94899a2000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7f94899a2000
mmap(0x7f94899a4000, 13352, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f94899a4000
close(3)??????????????????????????????? = 0
access("/etc/ld.so.nohwcap", F_OK)????? = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\t\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1868984, ...}) = 0
mmap(NULL, 3971488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f94893c1000
mprotect(0x7f9489581000, 2097152, PROT_NONE) = 0
mmap(0x7f9489781000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c0000) = 0x7f9489781000
mmap(0x7f9489787000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f9489787000
close(3)??????????????????????????????? = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9489b9c000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9489b9b000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9489b9a000
arch_prctl(ARCH_SET_FS, 0x7f9489b9b700) = 0
mprotect(0x7f9489781000, 16384, PROT_READ) = 0
mprotect(0x7f94899a2000, 4096, PROT_READ) = 0
mprotect(0x601000, 4096, PROT_READ)???? = 0
mprotect(0x7f9489bcd000, 4096, PROT_READ) = 0
munmap(0x7f9489b9e000, 188922)????????? = 0
set_tid_address(0x7f9489b9b9d0)???????? = 1010
set_robust_list(0x7f9489b9b9e0, 24)???? = 0
rt_sigaction(SIGRTMIN, {0x7f9489790b50, [], SA_RESTORER|SA_SIGINFO, 0x7f948979c390}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7f9489790be0, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7f948979c390}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
shmget(IPC_PRIVATE, 40, 0600)?????????? = 24346824
shmat(24346824, NULL, 0)??????????????? = 0x7f9489bcc000
shmget(IPC_PRIVATE, 1000, IPC_CREAT|0600) = 24379593
shmat(24379593, NULL, 0)??????????????? = 0x7f9489bcb000
shmget(IPC_PRIVATE, 20, IPC_CREAT|0600) = 24412362
shmat(24412362, NULL, 0)??????????????? = 0x7f9489bca000
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f9489b9b9d0) = 1011
nanosleep({1, 0}, {0, 999848639})?????? = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1011, si_uid=1000, si_status=1, si_utime=0, si_stime=0} ---
restart_syscall(<... resuming interrupted nanosleep ...>) = 0
futex(0x7f9489bcc000, FUTEX_WAIT, 2, NULL
可見最后卡住在futex處
futex (fast userspace mutex) 是Linux的一個基礎(chǔ)組件,可以用來構(gòu)建各種更高級別的同步機制,比如互斥鎖或者信號量等等
使用strace只能查看到程序卡住的原因是發(fā)生了死鎖,具體在哪一行代碼無法知曉;
比較適用于只有可執(zhí)行文件沒有源代碼的情況
方式三,使用valgrind的drd工具來檢測
#$ valgrind valgrind --tool=drd --trace-mutex=yes ./a.out
==7590== Memcheck, a memory error detector
==7590== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==7590== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==7590== Command: /usr/bin/valgrind --tool=drd --trace-mutex=yes ./a.out
==7590==
==7590== drd, a thread error detector
==7590== Copyright (C) 2006-2015, and GNU GPL'd, by Bart Van Assche.
==7590== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==7590== Command: ./a.out
==7590==
==7590== [1] mutex_init????? mutex 0x4027000
==7608== [1] mutex_trylock?? mutex 0x4027000 rc 0 owner 0
==7608== [1] post_mutex_lock mutex 0x4027000 rc 0 owner 0
==7608== [1] mutex_trylock?? recursive mutex 0x4226948 rc 0 owner 0
==7608== [1] post_mutex_lock recursive mutex 0x4226948 rc 0 owner 0
==7608== [1] mutex_unlock??? recursive mutex 0x4226948 rc 1
==7608==
==7608== For counts of detected and suppressed errors, rerun with: -v
==7608== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==7590== [1] mutex_trylock?? mutex 0x4027000 rc 0 owner 0
^C 發(fā)生死鎖 按CTRL/C退出
==7590== Process terminating with default action of signal 2 (SIGINT)
==7590==??? at 0x4E5C26D: __lll_lock_wait (lowlevellock.S:135)
==7590==??? by 0x4E55DBC: pthread_mutex_lock (pthread_mutex_lock.c:80)
==7590==??? by 0x4C371FE: pthread_mutex_lock (in /usr/lib/valgrind/vgpreload_drd-amd64-linux.so)
==7590==??? by 0x400D77: main (mutex-multi-process.c:119)
==7590==
==7590== For counts of detected and suppressed errors, rerun with: -v
==7590== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
?DRD輸出信息較多,也可用于分析鎖占用時間
?方式四, 使用valgrind的helgrind工具
gwind@gwind-P5820T:/media/gwind/windcode/self-code-snippet/misc/utils-core$ valgrind --tool=helgrind? ./a.out ==7003== Helgrind, a thread error detector
==7003== Copyright (C) 2007-2015, and GNU GPL'd, by OpenWorks LLP et al.
==7003== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==7003== Command: ./a.out
==7003==
==7004== ---Thread-Announcement------------------------------------------
==7004==
==7004== Thread #1 is the program's root thread
==7004==
==7004== ----------------------------------------------------------------
==7004==
==7004== Thread #1: Exiting thread still holds 1 lock
==7004==??? at 0x51297C8: _Exit (_exit.c:31)
==7004==??? by 0x5096FBA: __run_exit_handlers (exit.c:97)
==7004==??? by 0x5097054: exit (exit.c:104)
==7004==??? by 0x400D61: main (mutex-multi-process.c:104)
==7004==
==7004==
==7004== For counts of detected and suppressed errors, rerun with: -v
==7004== Use --history-level=approx or =none to gain increased speed, at
==7004== the cost of reduced accuracy of conflicting-access information
==7004== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
^C? CTRL/C退出
==7003== Process terminating with default action of signal 2 (SIGINT)
==7003==??? at 0x4E5026D: __lll_lock_wait (lowlevellock.S:135)
==7003==??? by 0x4E49DBC: pthread_mutex_lock (pthread_mutex_lock.c:80)
==7003==??? by 0x4C32156: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==7003==??? by 0x400D77: main (mutex-multi-process.c:119)
==7003==
==7003== For counts of detected and suppressed errors, rerun with: -v
==7003== Use --history-level=approx or =none to gain increased speed, at
==7003== the cost of reduced accuracy of conflicting-access information
==7003== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
helgrind工具輸出較為精煉,查看更加方便
總結(jié):
1. Mutex多用于線程間同步,用于多進程同步時,需要設(shè)置為PTHREAD_PROCESS_SHARED
2. 分析死鎖時,根據(jù)不同情況使用不同工具來說結(jié)合分析
strace可跟蹤系統(tǒng)調(diào)用狀態(tài)
有源碼可編譯時,直接使用gdb跟蹤
valgrind的drd工具不僅能分析鎖狀態(tài),同時能評估鎖效率, helgrind分析死鎖更加精煉
總結(jié)
以上是生活随笔為你收集整理的Linux Mutex机制与死锁分析的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Openwrt按键检测分析-窥探Linu
- 下一篇: linux 核间通讯rpmsg架构分析