日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

pdflush内核线程池及其中隐含的竞争

發(fā)布時(shí)間:2023/12/19 编程问答 30 豆豆
生活随笔 收集整理的這篇文章主要介紹了 pdflush内核线程池及其中隐含的竞争 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

pdflush內(nèi)核線程池是Linux為了回寫文件系統(tǒng)數(shù)據(jù)而創(chuàng)建的進(jìn)程上下文工作環(huán)境。它的實(shí)現(xiàn)比較精巧,全部代碼只有不到250行。

?

? 1 /*
? 2? * mm/pdflush.c - worker threads for writing back filesystem data
? 3? *
? 4? * Copyright (C) 2002, Linus Torvalds.
? 5? *
? 6? * 09Apr2002??? akpm@zip.com.au
? 7? *????? Initial version
? 8? * 29Feb2004??? kaos@sgi.com
? 9? *????? Move worker thread creation to kthread to avoid chewing
?10? *????? up stack space with nested calls to kernel_thread.
?11? */
?


文件頭部的說明,主要包含版權(quán)信息和主要的更改記錄(Changlog)。kaos@sgi.com將內(nèi)核工作線程的創(chuàng)建工作移交給了kthread,主要是為了防止過多的內(nèi)核線程消耗太多的父工作線程的堆??臻g。關(guān)于這個(gè)改變我們也能夠通過ps的結(jié)果看出:

?

root???????? 5???? 1???? 5? 0??? 1 21:31 ???????? 00:00:00 [kthread]
root?????? 114???? 5?? 114? 0??? 1 21:31 ???????? 00:00:00 [pdflush]
root?????? 115???? 5?? 115? 0??? 1 21:31 ???????? 00:00:00 [pdflush]
?


所有pdflush內(nèi)核線程的父進(jìn)程都是kthread進(jìn)程(pid為5)。

?

?12
?13 #include <linux/sched.h>
?14 #include <linux/list.h>
?15 #include <linux/signal.h>
?16 #include <linux/spinlock.h>
?17 #include <linux/gfp.h>
?18 #include <linux/init.h>
?19 #include <linux/module.h>
?20 #include <linux/fs.h>?????? // Needed by writeback.h
?21 #include <linux/writeback.h>??? // Prototypes pdflush_operation()
?22 #include <linux/kthread.h>
?23 #include <linux/cpuset.h>
?24
?25
?


包含一些比要的頭文件。不過有一點(diǎn)不怎么好,雖然C++的行注釋已經(jīng)遷移到了C,可在內(nèi)核的代碼里面看到,還是一樣的不舒服,可能是我太挑剔了,本身也沒啥不好,我可能需要與時(shí)俱進(jìn)。

?

?26 /*
?27? * Minimum and maximum number of pdflush instances
?28? */
?29 #define MIN_PDFLUSH_THREADS 2
?30 #define MAX_PDFLUSH_THREADS 8
?31
?32 static void start_one_pdflush_thread(void);
?33
?34
?


29和30行分別定義了pdflush內(nèi)核線程實(shí)例的最小和最大數(shù)量,分別是2和8。最小線程數(shù)是為了減少操作的延時(shí),最大線程數(shù)是為了防止過多的線程降低系統(tǒng)性能。不過,這里的最大線程數(shù)有些問題,下面我們分析其中的競(jìng)爭(zhēng)條件時(shí)會(huì)再次提及它。

?

?35 /*
?36? * The pdflush threads are worker threads for writing back dirty data.
?37? * Ideally, we'd like one thread per active disk spindle.? But the disk
?38? * topology is very hard to divine at this level.?? Instead, we take
?39? * care in various places to prevent more than one pdflush thread from
?40? * performing writeback against a single filesystem.? pdflush threads
?41? * have the PF_FLUSHER flag set in current->flags to aid in this.
?42? */
?43
?


上面這段注釋是對(duì)pdflush線程池的簡(jiǎn)單解釋,大致的意思就是:“pdflush線程是為了將臟數(shù)據(jù)寫回的工作線程。比較理想的情況是為每一個(gè)活躍的磁盤軸創(chuàng)建一個(gè)線程,但是在這個(gè)層次上比較難確定磁盤的拓?fù)浣Y(jié)構(gòu),因此,我們處處小心,盡量防止對(duì)單一文件系統(tǒng)做多個(gè)回寫操作。pdflush線程可以通過current->flags中PF_FLUSHER標(biāo)志來協(xié)助實(shí)現(xiàn)這個(gè)。”

可以看出,內(nèi)核開發(fā)者們對(duì)于效率還是相當(dāng)?shù)摹傲邌荨?#xff0c;考慮的比較周全。但是,對(duì)于層次的劃分也相當(dāng)關(guān)注,時(shí)刻不敢越“雷池”半步,那么的謹(jǐn)小慎微。

?

?43
?44 /*
?45? * All the pdflush threads.? Protected by pdflush_lock
?46? */
?47 static LIST_HEAD(pdflush_list);
?48 static DEFINE_SPINLOCK(pdflush_lock);
?49
?50 /*
?51? * The count of currently-running pdflush threads.? Protected
?52? * by pdflush_lock.
?53? *
?54? * Readable by sysctl, but not writable.? Published to userspace at
?55? * /proc/sys/vm/nr_pdflush_threads.
?56? */
?57 int nr_pdflush_threads = 0;
?58
?59 /*
?60? * The time at which the pdflush thread pool last went empty
?61? */
?62 static unsigned long last_empty_jifs;
?63
?


定義個(gè)一些必要的全局變量,為了不污染內(nèi)核的名字空間,對(duì)于不需要導(dǎo)出的變量都采用了static關(guān)鍵字限定了它們的作用域?yàn)榇司幾g單元(即當(dāng)前的pdflush.c文件)。所有的空閑pdflush線程都被串在雙向鏈表pdflush_list里面,并用變量nr_pdflush_threads對(duì)當(dāng)前pdflush的進(jìn)程(包括活躍的和空閑的)數(shù)就行統(tǒng)計(jì),last_empty_jifs用來記錄pdflush線程池上次為空(也就是無線程可用)的jiffies時(shí)間,線程池中所有需要互斥操作的場(chǎng)合都采用自旋鎖pdflush_lock進(jìn)行加鎖保護(hù)。

?

?64 /*
?65? * The pdflush thread.
?66? *
?67? * Thread pool management algorithm:
?68? *
?69? * - The minimum and maximum number of pdflush instances are bound
?70? *?? by MIN_PDFLUSH_THREADS and MAX_PDFLUSH_THREADS.
?71? *
?72? * - If there have been no idle pdflush instances for 1 second, create
?73? *?? a new one.
?74? *
?75? * - If the least-recently-went-to-sleep pdflush thread has been asleep
?76? *?? for more than one second, terminate a thread.
?77? */
?78
?


又是一大段注釋,不知道你有沒有看煩,反正我都有點(diǎn)兒膩煩了,本來只想就其間的競(jìng)爭(zhēng)說兩句,沒想到扯出這么多東西!上面介紹的是線程池的算法:

  • pdflush線程實(shí)例的數(shù)量介于MIN_PDFLUSH_THREADS和MAX_PDFLUSH_THREADS之間。
  • 如果線程池持續(xù)1秒沒有空閑線程,則創(chuàng)建一個(gè)新的線程。
  • 如果那個(gè)最先睡眠的進(jìn)程休息了超過1秒,則結(jié)束一個(gè)線程實(shí)例。
  • ?79 /*
    ?80? * A structure for passing work to a pdflush thread.? Also for passing
    ?81? * state information between pdflush threads.? Protected by pdflush_lock.
    ?82? */
    ?83 struct pdflush_work {
    ?84???????? struct task_struct *who;??????? /* The thread */
    ?85???????? void (*fn)(unsigned long);????? /* A callback function */
    ?86???????? unsigned long arg0;???????????? /* An argument to the callback */
    ?87???????? struct list_head list;????????? /* On pdflush_list, when idle */
    ?88???????? unsigned long when_i_went_to_sleep;
    ?89 };
    ?90
    ?


    上面定義了每個(gè)線程實(shí)例的節(jié)點(diǎn)數(shù)據(jù)結(jié)構(gòu),比較簡(jiǎn)明,不需要再廢話。

    現(xiàn)在,基本的數(shù)據(jù)結(jié)構(gòu)的變量都瀏覽了一遍,接下來我們將從module_init這個(gè)入口著手分析:

    ?

    232 static int __init pdflush_init(void)
    233 {
    234???????? int i;
    235
    236???????? for (i = 0; i < MIN_PDFLUSH_THREADS; i++)
    237???????????????? start_one_pdflush_thread();
    238???????? return 0;
    239 }
    240
    241 module_init(pdflush_init);
    ?


    創(chuàng)建MIN_PDFLUSH_THREADS個(gè)pdflush線程實(shí)例。請(qǐng)注意,這里只有module_init()定義,而沒有module_exit(),言外之意就是:這個(gè)程序即使編譯成內(nèi)核模塊,也是只能添加不能刪除。請(qǐng)參看sys_delete_module()的實(shí)現(xiàn):

    File: kernel/module.c

    ?

    ?? 609????? /* If it has an init func, it must have an exit func to unload */
    ?? 610????? if ((mod->init != NULL && mod->exit == NULL)
    ?? 611????????? || mod->unsafe) {
    ?? 612????????? forced = try_force(flags);
    ?? 613????????? if (!forced) {
    ?? 614????????????? /* This module can't be removed */
    ?? 615????????????? ret = -EBUSY;
    ?? 616????????????? goto out;
    ?? 617????????? }
    ?? 618????? }
    ?

    ?

    ?? 498? #ifdef CONFIG_MODULE_FORCE_UNLOAD
    ?? 499? static inline int try_force(unsigned int flags)
    ?? 500? {
    ?? 501????? int ret = (flags & O_TRUNC);
    ?? 502????? if (ret)
    ?? 503????????? add_taint(TAINT_FORCED_MODULE);
    ?? 504????? return ret;
    ?? 505? }
    ?? 506? #else
    ?? 507? static inline int try_force(unsigned int flags)
    ?? 508? {
    ?? 509????? return 0;
    ?? 510? }
    ?? 511? #endif /* CONFIG_MODULE_FORCE_UNLOAD */
    ?


    可見,除非編譯的時(shí)候選擇了模塊強(qiáng)制卸載(注意:這個(gè)選項(xiàng)比較危險(xiǎn),不要嘗試)的選項(xiàng),否則這樣的模塊是不允許被卸載的。再次回到pdflush:

    ?

    227 static void start_one_pdflush_thread(void)
    228 {
    229???????? kthread_run(pdflush, NULL, "pdflush");
    230 }
    231
    ?


    用kthread_run借助kthread幫助線程生成pdflush內(nèi)核線程實(shí)例:

    ?

    164 /*
    165? * Of course, my_work wants to be just a local in __pdflush().? It is
    166? * separated out in this manner to hopefully prevent the compiler from
    167? * performing unfortunate optimisations against the auto variables.? Because
    168? * these are visible to other tasks and CPUs.? (No problem has actually
    169? * been observed.? This is just paranoia).
    170? */
    這段注釋比較有意思,為了防止編譯器將局部變量my_work優(yōu)化成寄存器變量,所以這里整個(gè)處理流程轉(zhuǎn)變成了pdflush套__pdflush的方式。實(shí)際上,局部變量的采用相對(duì)于動(dòng)態(tài)申請(qǐng)內(nèi)存,無論是在空間利用率還是在時(shí)間效率上都是有好處的。
    171 static int pdflush(void *dummy)
    172 {
    173???????? struct pdflush_work my_work;
    174???????? cpumask_t cpus_allowed;
    175
    176???????? /*
    177????????? * pdflush can spend a lot of time doing encryption via dm-crypt.? We
    178????????? * don't want to do that at keventd's priority.
    179????????? */
    180???????? set_user_nice(current, 0);
    微調(diào)優(yōu)先級(jí),提高系統(tǒng)的整體響應(yīng)。
    181
    182???????? /*
    183????????? * Some configs put our parent kthread in a limited cpuset,
    184????????? * which kthread() overrides, forcing cpus_allowed == CPU_MASK_ALL.
    185????????? * Our needs are more modest - cut back to our cpusets cpus_allowed.
    186????????? * This is needed as pdflush's are dynamically created and destroyed.
    187????????? * The boottime pdflush's are easily placed w/o these 2 lines.
    188????????? */
    189???????? cpus_allowed = cpuset_cpus_allowed(current);
    190???????? set_cpus_allowed(current, cpus_allowed);
    設(shè)置允許運(yùn)行的CPU集合掩碼。
    191
    192???????? return __pdflush(&my_work);
    193 }
    ?

    ?

    ?91 static int __pdflush(struct pdflush_work *my_work)
    ?92 {
    ?93???????? current->flags |= PF_FLUSHER;
    ?94???????? my_work->fn = NULL;
    ?95???????? my_work->who = current;
    ?96???????? INIT_LIST_HEAD(&my_work->list);
    做些初始化動(dòng)作。
    ?97
    ?98???????? spin_lock_irq(&pdflush_lock);
    因?yàn)橐獙?duì)nr_pdflush_threads和pdflush_list操作,所以需要加互斥鎖,為了避免意外(pdflush任務(wù)的添加可能在硬中斷上下文),故同時(shí)關(guān)閉硬中斷。
    ?99???????? nr_pdflush_threads++;
    將nr_pdflush_threads的計(jì)數(shù)加1,因?yàn)槎嗔艘粋€(gè)pdflush內(nèi)核線程實(shí)例。
    100???????? for ( ; ; ) {
    101???????????????? struct pdflush_work *pdf;
    102
    103???????????????? set_current_state(TASK_INTERRUPTIBLE);
    104???????????????? list_move(&my_work->list, &pdflush_list);
    105???????????????? my_work->when_i_went_to_sleep = jiffies;
    106???????????????? spin_unlock_irq(&pdflush_lock);
    107
    108???????????????? schedule();
    將自己加入空閑線程列表pdflush_list,然后讓出cpu,等待被調(diào)度。
    109???????????????? if (try_to_freeze()) {
    110???????????????????????? spin_lock_irq(&pdflush_lock);
    111???????????????????????? continue;
    112???????????????? }
    如果正在凍結(jié)當(dāng)前進(jìn)程,繼續(xù)循環(huán)。
    113
    114???????????????? spin_lock_irq(&pdflush_lock);
    115???????????????? if (!list_empty(&my_work->list)) {
    116???????????????????????? printk("pdflush: bogus wakeup!\n");
    117???????????????????????? my_work->fn = NULL;
    118???????????????????????? continue;
    119???????????????? }
    120???????????????? if (my_work->fn == NULL) {
    121???????????????????????? printk("pdflush: NULL work function\n");
    122???????????????????????? continue;
    123???????????????? }
    124???????????????? spin_unlock_irq(&pdflush_lock);
    上面是對(duì)被意外喚醒情況的處理。
    125
    126???????????????? (*my_work->fn)(my_work->arg0);
    127
    帶參數(shù)arg0執(zhí)行任務(wù)函數(shù)。
    128???????????????? /*
    129????????????????? * Thread creation: For how long have there been zero
    130????????????????? * available threads?
    131????????????????? */
    132???????????????? if (jiffies - last_empty_jifs > 1 * HZ) {
    133???????????????????????? /* unlocked list_empty() test is OK here */
    134???????????????????????? if (list_empty(&pdflush_list)) {
    135???????????????????????????????? /* unlocked test is OK here */
    136???????????????????????????????? if (nr_pdflush_threads < MAX_PDFLUSH_THREADS)
    137???????????????????????????????????????? start_one_pdflush_thread();
    138???????????????????????? }
    139???????????????? }
    如果pdflush_list為空超過1妙,并且線程數(shù)量還有可以增長(zhǎng)的余地,則重新啟動(dòng)一個(gè)新的pdflush線程實(shí)例。
    140
    141???????????????? spin_lock_irq(&pdflush_lock);
    142???????????????? my_work->fn = NULL;
    143
    144???????????????? /*
    145????????????????? * Thread destruction: For how long has the sleepiest
    146????????????????? * thread slept?
    147????????????????? */
    148???????????????? if (list_empty(&pdflush_list))
    149???????????????????????? continue;
    如果pdflush_list依然為空,繼續(xù)循環(huán)。
    150???????????????? if (nr_pdflush_threads <= MIN_PDFLUSH_THREADS)
    151???????????????????????? continue;
    如果線程數(shù)量不大于最小線程數(shù),繼續(xù)循環(huán)。
    152???????????????? pdf = list_entry(pdflush_list.prev, struct pdflush_work, list);
    153???????????????? if (jiffies - pdf->when_i_went_to_sleep > 1 * HZ) {
    154???????????????????????? /* Limit exit rate */
    155???????????????????????? pdf->when_i_went_to_sleep = jiffies;
    156???????????????????????? break;????????????????????????????????? /* exeunt */
    157???????????????? }
    如果pdflush_list的最后一個(gè)內(nèi)核線程睡眠超過1秒,可能系統(tǒng)變得較為輕閑,結(jié)束本線程。為什么是最后一個(gè)?因?yàn)檫@個(gè)list是作為棧來使用的,所以棧底的元素也肯定就是最老的元素。
    158???????? }
    159???????? nr_pdflush_threads--;
    160???????? spin_unlock_irq(&pdflush_lock);
    161???????? return 0;
    nr_pdflush_threads減1,退出本線程。
    162 }
    163
    ?


    是不是少做了些工作?沒錯(cuò),好象沒有處理SIGCHLD信號(hào)。其實(shí)用kthread創(chuàng)建的進(jìn)程都是自己清理自己的,根本就無須父進(jìn)程wait,不會(huì)產(chǎn)生僵尸進(jìn)程,請(qǐng)參看

    File: kernel/workqueue.c

    ?? 200????? /* SIG_IGN makes children autoreap: see do_notify_parent(). */
    ?? 201????? sa.sa.sa_handler = SIG_IGN;
    ?? 202????? sa.sa.sa_flags = 0;
    ?? 203????? siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));
    ?? 204????? do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
    ?


    另外在sigaction的手冊(cè)頁中可以詳細(xì)的看到關(guān)于忽略SIGCHLD的“后果”:

    ?

    ?????? POSIX.1-1990? disallowed setting the action for SIGCHLD to SIG_IGN.
    ?????? POSIX.1-2001 allows this possibility, so that ignoring SIGCHLD? can
    ?????? be? used? to prevent the creation of zombies (see wait(2)).? Never-
    ?????? theless, the historical BSD and System V? behaviours? for? ignoring
    ?????? SIGCHLD? differ,? so? that? the? only completely portable method of
    ?????? ensuring that terminated children do not become zombies is to catch
    ?????? the SIGCHLD signal and perform a wait(2) or similar.
    ?


    無疑Linux內(nèi)核是符合較新的POSIX標(biāo)準(zhǔn)的,這也給我們提供了一個(gè)避免產(chǎn)生僵尸進(jìn)程的“簡(jiǎn)易”方法,不過要注意:這種手法是不可以移植的。

    請(qǐng)折回頭來再次考慮函數(shù)__pdflush(),這次我們關(guān)注其間的競(jìng)爭(zhēng):

    ?

    135???????????????????????????????? /* unlocked test is OK here */
    136???????????????????????????????? if (nr_pdflush_threads < MAX_PDFLUSH_THREADS)
    137???????????????????????????????????????? start_one_pdflush_thread();
    ?


    雖然開鎖判斷線程數(shù)不會(huì)造成數(shù)據(jù)損壞,但是如果有幾個(gè)進(jìn)程并行判斷nr_pdflush_threads的值,并都一致認(rèn)為線程數(shù)還有可以增長(zhǎng)的余地,然后都調(diào)用start_one_pdflush_thread()去產(chǎn)生新的pdflush線程實(shí)例,那么線程數(shù)就可能超過MAX_PDFLUSH_THREADS,最壞的情況下可能是其兩倍

    再來看接下來的行:

    ?

    152???????????????? pdf = list_entry(pdflush_list.prev, struct pdflush_work, list);
    153???????????????? if (jiffies - pdf->when_i_went_to_sleep > 1 * HZ) {
    154???????????????????????? /* Limit exit rate */
    155???????????????????????? pdf->when_i_went_to_sleep = jiffies;
    156???????????????????????? break;????????????????????????????????? /* exeunt */
    157???????????????? }
    ?


    考慮瞬間的迸發(fā)請(qǐng)求,然后都在同一時(shí)刻停止運(yùn)行,這時(shí)所有進(jìn)程退出的時(shí)候都不會(huì)滿足153行的判定,然后都會(huì)去睡眠,再假設(shè)接下來的n秒內(nèi)都沒有新的請(qǐng)求出發(fā),那么pdflush內(nèi)核線程數(shù)最大的情況將持續(xù)n秒,不符合當(dāng)初的設(shè)計(jì)要求3

    ?

    195 /*
    196? * Attempt to wake up a pdflush thread, and get it to do some work for you.
    197? * Returns zero if it indeed managed to find a worker thread, and passed your
    198? * payload to it.
    199? */
    200 int pdflush_operation(void (*fn)(unsigned long), unsigned long arg0)
    201 {
    202???????? unsigned long flags;
    203???????? int ret = 0;
    204
    205???????? if (fn == NULL)
    206???????????????? BUG();????????? /* Hard to diagnose if it's deferred */
    207
    208???????? spin_lock_irqsave(&pdflush_lock, flags);
    209???????? if (list_empty(&pdflush_list)) {
    210???????????????? spin_unlock_irqrestore(&pdflush_lock, flags);
    211???????????????? ret = -1;
    212???????? } else {
    213???????????????? struct pdflush_work *pdf;
    214
    215???????????????? pdf = list_entry(pdflush_list.next, struct pdflush_work, list);
    216???????????????? list_del_init(&pdf->list);
    217???????????????? if (list_empty(&pdflush_list))
    218???????????????????????? last_empty_jifs = jiffies;
    219???????????????? pdf->fn = fn;
    220???????????????? pdf->arg0 = arg0;
    221???????????????? wake_up_process(pdf->who);
    222???????????????? spin_unlock_irqrestore(&pdflush_lock, flags);
    223???????? }
    224???????? return ret;
    225 }
    226
    ?


    上面的函數(shù)用來給pdflush線程分配任務(wù),如果當(dāng)前有空閑線程可用,則分配一個(gè)任務(wù)給它,接著喚醒它,讓它去執(zhí)行。

    總結(jié)

    內(nèi)核編程需要縝密的思維,稍有不甚就有可能引發(fā)意外,無論你的代碼有多短,必須慎之又慎。雖然pdflush的線程池實(shí)現(xiàn)存在以上提到的兩點(diǎn)競(jìng)爭(zhēng),但是他們都不會(huì)造成十分嚴(yán)重的后果,只不過不符合設(shè)計(jì)要求,不能作為一個(gè)良好的實(shí)現(xiàn)而推行。

    注意:

    本文中“內(nèi)核線程”、“線程”和“進(jìn)程”交叉使用,但實(shí)際上他們都代表“內(nèi)核線程”,并且這樣也沒啥不妥,“線程”作為“內(nèi)核線程”的簡(jiǎn)稱,而“內(nèi)核線程”本質(zhì)就是共享內(nèi)核數(shù)據(jù)空間的一組“進(jìn)程”,所以在某些情況下兩者互換,并無大礙。

    原文:http://blog.chinaunix.net/u/5251/showart_320793.html

    轉(zhuǎn)載于:https://www.cnblogs.com/yuanfang/archive/2010/12/24/1916227.html

    總結(jié)

    以上是生活随笔為你收集整理的pdflush内核线程池及其中隐含的竞争的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。