日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

时钟源为什么会影响性能

發(fā)布時(shí)間:2024/4/11 编程问答 32 豆豆
生活随笔 收集整理的這篇文章主要介紹了 时钟源为什么会影响性能 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

前幾天幫同事看問題時(shí),意外的發(fā)現(xiàn)了時(shí)鐘源影響性能的 case, 比較典型,記錄一下。網(wǎng)上也有人遇到過,參考蝦皮的[Go] Time.Now函數(shù)CPU使用率異常[1] 和 Two frequently used system calls are ~77% slower on AWS EC2[2]

本質(zhì)都是由 vdso fallback 到系統(tǒng)調(diào)用,所以慢了,但是觸發(fā)這個(gè)條件的原因不太一樣。我最后的分析也可能理解有誤,歡迎一起討論并指正。

另外,配圖的意思是知道這些就可以了,往下看沒球用 :(

現(xiàn)象

上圖是 perf 性能圖,可以發(fā)現(xiàn) __clock_gettime 系統(tǒng)調(diào)用相關(guān)的耗時(shí)最多,非常詭異。

//?time_demo.go //?strace?-ce?clock_gettime?go?run?time_demo.go package?mainimport?("fmt""time" )func?main(){for?i?:=?0;?i?<?10;?i++{t1?:=?time.Now()t2?:=?time.Now()fmt.Printf("Time?taken:?%v\n",?t2.Sub(t1))} }

上圖是最小復(fù)現(xiàn) demo, 直接查看 time.Now() 函數(shù)的耗時(shí)。使用 strace -ce 來查看系統(tǒng)調(diào)用的統(tǒng)計(jì)報(bào)表

~#?strace?-ce?clock_gettime?go?run?time_demo.go Time?taken:?1.983μs Time?taken:?1.507μs Time?taken:?2.247μs Time?taken:?2.993μs Time?taken:?2.703μs Time?taken:?1.927μs Time?taken:?2.091μs Time?taken:?2.16μs Time?taken:?2.085μs Time?taken:?2.234μs %?time?????seconds??usecs/call?????calls????errors?syscall ------?-----------?-----------?---------?---------?---------------- 100.00????0.001342??????????13???????105???????????clock_gettime ------?-----------?-----------?---------?---------?---------------- 100.00????0.001342???????????????????105???????????total

上面是有問題的機(jī)器結(jié)果,可以發(fā)現(xiàn)大量的系統(tǒng)調(diào)用 clock_gettime 產(chǎn)生。

~#?strace?-ce?clock_gettime?go?run?time_demo.go Time?taken:?138ns Time?taken:?94ns Time?taken:?73ns Time?taken:?88ns Time?taken:?87ns Time?taken:?83ns Time?taken:?93ns Time?taken:?78ns Time?taken:?93ns Time?taken:?99ns

上面是正常性能機(jī)器的結(jié)果,耗時(shí)是納秒級(jí)別的,快了幾個(gè)量級(jí)。并且沒有任何系統(tǒng)調(diào)用產(chǎn)生。可以想象一下,每個(gè)請(qǐng)求,不同模塊都要做大量的 P99 統(tǒng)計(jì),如果 time.Now 自身耗時(shí)這么大那這個(gè)服務(wù)基本不可用了。

有問題機(jī)器系統(tǒng)調(diào)用函數(shù)樣子如下:

clock_gettime(CLOCK_MONOTONIC,?{tv_sec=857882,?tv_nsec=454310014})?=?0

測(cè)試內(nèi)核是 5.4.0-1038

time.Now()

來看一下 go time.Now 的實(shí)現(xiàn)

//?src/runtime/timestub.go //go:linkname?time_now?time.now func?time_now()?(sec?int64,?nsec?int32,?mono?int64)?{sec,?nsec?=?walltime()return?sec,?nsec,?nanotime() }

time 只暴露了函數(shù)的定義,實(shí)現(xiàn)是由底層不同平臺(tái)的匯編實(shí)現(xiàn),暫時(shí)只關(guān)注 amd64, 來看下匯編代碼

//?src/runtime/sys_linux_amd64.s //?func?walltime1()?(sec?int64,?nsec?int32) //?non-zero?frame-size?means?bp?is?saved?and?restored TEXT?runtime·walltime1(SB),NOSPLIT,$8-12 ...... noswitch:SUBQ?$16,?SP??//?Space?for?resultsANDQ?$~15,?SP?//?Align?for?C?codeMOVQ?runtime·vdsoClockgettimeSym(SB),?AX ......

那么問題來了,vdso 是什么?

系統(tǒng)調(diào)用

首先說,大家都知道系統(tǒng)調(diào)用慢,涉及陷入內(nèi)核,上下文開銷。但是到底多慢呢?

上圖是系統(tǒng)調(diào)用和普通函數(shù)調(diào)用的開銷對(duì)比,參考 [Measurements of system call performance and overhead](http://arkanis.de/weblog/2017-01-05-measurements-of-system-call-performance-and-overhead, Measurements of system call performance and overhead), 可以看到,getpid 走系統(tǒng)調(diào)用的開銷遠(yuǎn)大于通過 vdso 的方式,而且也遠(yuǎn)大于普通函數(shù)調(diào)用。

vdso (virtual dynamic shared object) 參考 vdso man7[3], 本質(zhì)上來說,還是因?yàn)橄到y(tǒng)調(diào)用太慢,涉及到上下文切換,少部分頻繁使用的系統(tǒng)調(diào)用貢獻(xiàn)了大部分時(shí)間。所以把這部分,不涉及安全的從內(nèi)核空間,映射到用戶空間。

x86-64?functionsThe?table?below?lists?the?symbols?exported?by?the?vDSO.??All?ofthese?symbols?are?also?available?without?the?"__vdso_"?prefix,but?you?should?ignore?those?and?stick?to?the?names?below.symbol?????????????????version─────────────────────────────────__vdso_clock_gettime???LINUX_2.6__vdso_getcpu??????????LINUX_2.6__vdso_gettimeofday????LINUX_2.6__vdso_time????????????LINUX_2.6

上面就是 x86 支持 vdso 的函數(shù),一共 4 個(gè)?不可能這么少吧?來看一下線上真實(shí)情況的

~#?uname?-a Linux?5.4.0-1041-aws?#43~18.04.1-Ubuntu?SMP?Sat?Mar?20?15:47:52?UTC?2021?x86_64?x86_64?x86_64?GNU/Linux ~#?cat?/proc/self/maps?|?grep?-i?vdso 7fff2edff000-7fff2ee00000?r-xp?00000000?00:00?0??????????????????????????[vdso]

內(nèi)核版本是 5.4.0, 通過 maps 找到當(dāng)前進(jìn)程的vdso, 權(quán)限是r-xp,可讀可執(zhí)行但不可寫,我們可以直接把他dump出來看看。先在另一個(gè) session 執(zhí)行 cat, 等待輸入,然后用 gdb attach

~#?ps?aux?|?grep?cat root??????9869??0.0??0.0???9360???792?pts/1????S+???02:18???0:00?cat root??????9931??0.0??0.0??16152??1100?pts/0????S+???02:18???0:00?grep?--color=auto?cat ~#?cat?/proc/9869/maps?|?grep?-i?vdso 7ffe717e6000-7ffe717e7000?r-xp?00000000?00:00?0??????????????????????????[vdso] ~#?gdb?/bin/cat?9869 ........... (gdb)?dump?memory?/tmp/vdso.so?0x7ffe717e6000?0x7ffe717e7000 (gdb)?quit

再查看符號(hào)表

~#?file?/tmp/vdso.so /tmp/vdso.so:?ELF?64-bit?LSB?shared?object,?x86-64,?version?1?(SYSV),?dynamically?linked,?BuildID[sha1]=17d65245b85cd032de7ab130d053551fb0bd284a,?stripped ~#?objdump?-T?/tmp/vdso.so/tmp/vdso.so:?????file?format?elf64-x86-64DYNAMIC?SYMBOL?TABLE: 0000000000000950??w???DF?.text?00000000000000a1??LINUX_2.6???clock_gettime 00000000000008a0?g????DF?.text?0000000000000083??LINUX_2.6???__vdso_gettimeofday 0000000000000a00??w???DF?.text?000000000000000a??LINUX_2.6???clock_getres 0000000000000a00?g????DF?.text?000000000000000a??LINUX_2.6???__vdso_clock_getres 00000000000008a0??w???DF?.text?0000000000000083??LINUX_2.6???gettimeofday 0000000000000930?g????DF?.text?0000000000000015??LINUX_2.6???__vdso_time 0000000000000930??w???DF?.text?0000000000000015??LINUX_2.6???time 0000000000000950?g????DF?.text?00000000000000a1??LINUX_2.6???__vdso_clock_gettime 0000000000000000?g????DO?*ABS*?0000000000000000??LINUX_2.6???LINUX_2.6 0000000000000a10?g????DF?.text?000000000000002a??LINUX_2.6???__vdso_getcpu 0000000000000a10??w???DF?.text?000000000000002a??LINUX_2.6???getcpu

為什么這么麻煩呢?因?yàn)檫@個(gè) vdso.so 是在內(nèi)存中維護(hù)的,并不像其它 so 動(dòng)態(tài)庫一樣有對(duì)應(yīng)的文件。

說了這么多,所以問題來了,為什么有了 vdso, 獲取時(shí)間還要走系統(tǒng)調(diào)用呢???

時(shí)鐘源

關(guān)于時(shí)鐘源,下面的引用來自于 muahao

內(nèi)核在啟動(dòng)過程中會(huì)根據(jù)既定的優(yōu)先級(jí)選擇時(shí)鐘源。優(yōu)先級(jí)的排序根據(jù)時(shí)鐘的精度與訪問速度。其中CPU中的TSC寄存器是精度最高(與CPU最高主頻等同),訪問速度最快(只需一條指令,一個(gè)時(shí)鐘周期)的時(shí)鐘源,因此內(nèi)核優(yōu)選TSC作為計(jì)時(shí)的時(shí)鐘源。其它的時(shí)鐘源,如HPET, ACPI-PM,PIT等則作為備選。但是,TSC不同與HPET等時(shí)鐘,它的頻率不是預(yù)知的。因此,內(nèi)核必須在初始化過程中,利用HPET,PIT等始終來校準(zhǔn)TSC的頻率。如果兩次校準(zhǔn)結(jié)果偏差較大,則認(rèn)為TSC是不穩(wěn)定的,則使用其它時(shí)鐘源。并打印內(nèi)核日志:Clocksource tsc unstable.

正常來說,TSC的頻率很穩(wěn)定且不受CPU調(diào)頻的影響(如果CPU支持constant-tsc)。內(nèi)核不應(yīng)該偵測(cè)到它是unstable的。但是,計(jì)算機(jī)系統(tǒng)中存在一種名為SMI(System Management Interrupt)的中斷,該中斷不可被操作系統(tǒng)感知和屏蔽。如果內(nèi)核校準(zhǔn)TSC頻率的計(jì)算過程quick_ pit_ calibrate ()被SMI中斷干擾,就會(huì)導(dǎo)致計(jì)算結(jié)果偏差較大(超過1%),結(jié)果是tsc基準(zhǔn)頻率不準(zhǔn)確。最后導(dǎo)致機(jī)器上的時(shí)間戳信息都不準(zhǔn)確,可能偏慢或者偏快。

當(dāng)內(nèi)核認(rèn)為TSC unstable時(shí),切換到HPET等時(shí)鐘,不會(huì)給你的系統(tǒng)帶來過大的影響。當(dāng)然,時(shí)鐘精度或訪問時(shí)鐘的速度會(huì)受到影響。通過實(shí)驗(yàn)測(cè)試,訪問HPET的時(shí)間開銷為訪問TSC時(shí)間開銷的7倍左右。如果您的系統(tǒng)無法忍受這些,可以嘗試以下解決方法:在內(nèi)核啟動(dòng)時(shí),加入啟動(dòng)參數(shù):tsc=reliable

內(nèi)核實(shí)現(xiàn)

1. 各類時(shí)鐘源注冊(cè)

參考 linux insides[4] timers 一節(jié),可以看到各個(gè)時(shí)鐘源調(diào)用 clocksource_register_khz 進(jìn)行注冊(cè),分別看 tsc 和 xen

static?int?__init?init_tsc_clocksource(void) {......if?(boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ))?{if?(boot_cpu_has(X86_FEATURE_ART))art_related_clocksource?=?&clocksource_tsc;clocksource_register_khz(&clocksource_tsc,?tsc_khz); ...... }static?struct?clocksource?clocksource_tsc?=?{.name???????????????????=?"tsc",.rating?????????????????=?300,.read???????????????????=?read_tsc,.mask???????????????????=?CLOCKSOURCE_MASK(64),.flags??????????????????=?CLOCK_SOURCE_IS_CONTINUOUS?|CLOCK_SOURCE_VALID_FOR_HRES?|CLOCK_SOURCE_MUST_VERIFY,.archdata???????????????=?{?.vclock_mode?=?VCLOCK_TSC?},.resume???=?tsc_resume,.mark_unstable??=?tsc_cs_mark_unstable,.tick_stable??=?tsc_cs_tick_stable,.list???=?LIST_HEAD_INIT(clocksource_tsc.list), };

查看 clocksource_tsc 時(shí)鐘源的 vclock_mode 是 VCLOCK_TSC

static?void?__init?xen_time_init(void) { ......clocksource_register_hz(&xen_clocksource,?NSEC_PER_SEC); ...... }static?void?xen_setup_vsyscall_time_info(void) { ......xen_clocksource.archdata.vclock_mode?=?VCLOCK_PVCLOCK; }

查看 xen 時(shí)鐘源的 vclock_mode 是 VCLOCK_PVCLOCK

2. 時(shí)鐘源與 timekeeper

那么問題來了,clocksource 是如何與 vdso_data 關(guān)聯(lián)的呢?這里面比較復(fù)雜,參考 linux內(nèi)核中的定時(shí)器和時(shí)間管理[5] 和 vdso段數(shù)據(jù)更新, 定位到 /kernel/time/tick-common.c 的 timekeeping_update 函數(shù),由它負(fù)責(zé)將定時(shí)器更新到用戶層的 vdso 區(qū)。

/*?must?hold?timekeeper_lock?*/ static?void?timekeeping_update(struct?timekeeper?*tk,?unsigned?int?action) { ......update_vsyscall(tk);update_pvclock_gtod(tk,?action?&?TK_CLOCK_WAS_SET); ...... }void?update_vsyscall(struct?timekeeper?*tk) {struct?vdso_data?*vdata?=?__arch_get_k_vdso_data();struct?vdso_timestamp?*vdso_ts;s32?clock_mode;u64?nsec;/*?copy?vsyscall?data?*/vdso_write_begin(vdata);clock_mode?=?tk->tkr_mono.clock->vdso_clock_mode;vdata[CS_HRES_COARSE].clock_mode?=?clock_mode;vdata[CS_RAW].clock_mode??=?clock_mode;/*?CLOCK_REALTIME?also?required?for?time()?*/vdso_ts??=?&vdata[CS_HRES_COARSE].basetime[CLOCK_REALTIME];vdso_ts->sec?=?tk->xtime_sec;vdso_ts->nsec?=?tk->tkr_mono.xtime_nsec;/*?CLOCK_REALTIME_COARSE?*/vdso_ts??=?&vdata[CS_HRES_COARSE].basetime[CLOCK_REALTIME_COARSE];vdso_ts->sec?=?tk->xtime_sec;vdso_ts->nsec?=?tk->tkr_mono.xtime_nsec?>>?tk->tkr_mono.shift;/*?CLOCK_MONOTONIC_COARSE?*/vdso_ts??=?&vdata[CS_HRES_COARSE].basetime[CLOCK_MONOTONIC_COARSE];vdso_ts->sec?=?tk->xtime_sec?+?tk->wall_to_monotonic.tv_sec;nsec??=?tk->tkr_mono.xtime_nsec?>>?tk->tkr_mono.shift;nsec??=?nsec?+?tk->wall_to_monotonic.tv_nsec;vdso_ts->sec?+=?__iter_div_u64_rem(nsec,?NSEC_PER_SEC,?&vdso_ts->nsec);/**?Read?without?the?seqlock?held?by?clock_getres().*?Note:?No?need?to?have?a?second?copy.*/WRITE_ONCE(vdata[CS_HRES_COARSE].hrtimer_res,?hrtimer_resolution);/**?If?the?current?clocksource?is?not?VDSO?capable,?then?spare?the*?update?of?the?high?reolution?parts.*/if?(clock_mode?!=?VDSO_CLOCKMODE_NONE)update_vdso_data(vdata,?tk);__arch_update_vsyscall(vdata,?tk);vdso_write_end(vdata);__arch_sync_vdso_data(vdata); }static?void?update_pvclock_gtod(struct?timekeeper?*tk) {struct?pvclock_gtod_data?*vdata?=?&pvclock_gtod_data;u64?boot_ns;boot_ns?=?ktime_to_ns(ktime_add(tk->tkr_mono.base,?tk->offs_boot));write_seqcount_begin(&vdata->seq);/*?copy?pvclock?gtod?data?*/vdata->clock.vclock_mode?=?tk->tkr_mono.clock->archdata.vclock_mode;vdata->clock.cycle_last??=?tk->tkr_mono.cycle_last;vdata->clock.mask??=?tk->tkr_mono.mask;vdata->clock.mult??=?tk->tkr_mono.mult;vdata->clock.shift??=?tk->tkr_mono.shift;vdata->boot_ns???=?boot_ns;vdata->nsec_base??=?tk->tkr_mono.xtime_nsec;vdata->wall_time_sec????????????=?tk->xtime_sec;write_seqcount_end(&vdata->seq); }static?void?update_pvclock_gtod(struct?timekeeper?*tk) {struct?pvclock_gtod_data?*vdata?=?&pvclock_gtod_data;u64?boot_ns;boot_ns?=?ktime_to_ns(ktime_add(tk->tkr_mono.base,?tk->offs_boot));write_seqcount_begin(&vdata->seq);/*?copy?pvclock?gtod?data?*/vdata->clock.vclock_mode?=?tk->tkr_mono.clock->archdata.vclock_mode;vdata->clock.cycle_last??=?tk->tkr_mono.cycle_last;vdata->clock.mask??=?tk->tkr_mono.mask;vdata->clock.mult??=?tk->tkr_mono.mult;vdata->clock.shift??=?tk->tkr_mono.shift;vdata->boot_ns???=?boot_ns;vdata->nsec_base??=?tk->tkr_mono.xtime_nsec;vdata->wall_time_sec????????????=?tk->xtime_sec;write_seqcount_end(&vdata->seq); }

上面的截圖來自 arm vdso 實(shí)現(xiàn),和 x86 的類似。

然后再看一下 timekeeper 和 clocksource 是如何對(duì)應(yīng)的呢?在 timekeeping_init 函數(shù)里

void?__init?timekeeping_init(void) {struct?timespec64?wall_time,?boot_offset,?wall_to_mono;struct?timekeeper?*tk?=?&tk_core.timekeeper;struct?clocksource?*clock; ......clock?=?clocksource_default_clock();if?(clock->enable)clock->enable(clock);tk_setup_internals(tk,?clock); ... }

這是初始化時(shí)的函數(shù),每當(dāng)時(shí)鐘源變更時(shí),會(huì)調(diào)用 change_clocksource 切換。

3. 如何調(diào)用時(shí)間函數(shù)

//?linux/lib/vdso/gettimeofday.c static?__maybe_unused?int __cvdso_clock_gettime(clockid_t?clock,?struct?__kernel_timespec?*ts) {int?ret?=?__cvdso_clock_gettime_common(clock,?ts);if?(unlikely(ret))return?clock_gettime_fallback(clock,?ts);return?0; } static?__always_inline long?clock_gettime_fallback(clockid_t?_clkid,?struct?__kernel_timespec?*_ts) {long?ret;asm?("syscall"?:?"=a"?(ret),?"=m"?(*_ts)?:"0"?(__NR_clock_gettime),?"D"?(_clkid),?"S"?(_ts)?:"rcx",?"r11");return?ret; }

先直接看 fallback 邏輯,好嘛,直接是匯編的 syscall 調(diào)用,注意這里匯編是和平臺(tái)相關(guān)的,這個(gè)代碼是 x86. 這里 unlikely 是做分支預(yù)測(cè)的,后面的事情大概率不會(huì)發(fā)生,如果 ret 不為 0, 說明 vdso 獲取時(shí)間失敗,那么來看下什么時(shí)候 __cvdso_clock_gettime_common 會(huì)失敗。

static?__maybe_unused?int __cvdso_clock_gettime_common(clockid_t?clock,?struct?__kernel_timespec?*ts) {const?struct?vdso_data?*vd?=?__arch_get_vdso_data();u32?msk;/*?Check?for?negative?values?or?invalid?clocks?*/if?(unlikely((u32)?clock?>=?MAX_CLOCKS))return?-1;/**?Convert?the?clockid?to?a?bitmask?and?use?it?to?check?which*?clocks?are?handled?in?the?VDSO?directly.*/msk?=?1U?<<?clock;if?(likely(msk?&?VDSO_HRES))?{return?do_hres(&vd[CS_HRES_COARSE],?clock,?ts);}?else?if?(msk?&?VDSO_COARSE)?{do_coarse(&vd[CS_HRES_COARSE],?clock,?ts);return?0;}?else?if?(msk?&?VDSO_RAW)?{return?do_hres(&vd[CS_RAW],?clock,?ts);}return?-1; }

這里只看 do_hres 實(shí)現(xiàn)

static?int?do_hres(const?struct?vdso_data?*vd,?clockid_t?clk,struct?__kernel_timespec?*ts) {const?struct?vdso_timestamp?*vdso_ts?=?&vd->basetime[clk];u64?cycles,?last,?sec,?ns;u32?seq;do?{seq?=?vdso_read_begin(vd);cycles?=?__arch_get_hw_counter(vd->clock_mode);ns?=?vdso_ts->nsec;last?=?vd->cycle_last;if?(unlikely((s64)cycles?<?0))return?-1;ns?+=?vdso_calc_delta(cycles,?last,?vd->mask,?vd->mult);ns?>>=?vd->shift;sec?=?vdso_ts->sec;}?while?(unlikely(vdso_read_retry(vd,?seq)));/**?Do?this?outside?the?loop:?a?race?inside?the?loop?could?result*?in?__iter_div_u64_rem()?being?extremely?slow.*/ts->tv_sec?=?sec?+?__iter_div_u64_rem(ns,?NSEC_PER_SEC,?&ns);ts->tv_nsec?=?ns;return?0; }

__arch_get_hw_counter 會(huì)根據(jù) clock_mode 求出 cycles 值,這是一個(gè) u64 類型,如果轉(zhuǎn)成 s64 為負(fù)數(shù),那就返回 -1, 此時(shí)會(huì)觸發(fā) fallback 系統(tǒng)調(diào)用邏輯。

static?inline?u64?__arch_get_hw_counter(s32?clock_mode) {if?(clock_mode?==?VCLOCK_TSC)return?(u64)rdtsc_ordered();/**?For?any?memory-mapped?vclock?type,?we?need?to?make?sure?that?gcc*?doesn't?cleverly?hoist?a?load?before?the?mode?check.??Otherwise?we*?might?end?up?touching?the?memory-mapped?page?even?if?the?vclock?in*?question?isn't?enabled,?which?will?segfault.??Hence?the?barriers.*/ #ifdef?CONFIG_PARAVIRT_CLOCKif?(clock_mode?==?VCLOCK_PVCLOCK)?{barrier();return?vread_pvclock();} #endif #ifdef?CONFIG_HYPERV_TIMERif?(clock_mode?==?VCLOCK_HVCLOCK)?{barrier();return?vread_hvclock();} #endifreturn?U64_MAX; } static?u64?vread_pvclock(void) {......do?{version?=?pvclock_read_begin(pvti);if?(unlikely(!(pvti->flags?&?PVCLOCK_TSC_STABLE_BIT)))return?U64_MAX;ret?=?__pvclock_read_cycles(pvti,?rdtsc_ordered());}?while?(pvclock_read_retry(pvti,?version));return?ret; }

這里判斷如果 flags 里沒有 PVCLOCK_TSC_STABLE_BIT 標(biāo)記,則返回 U64_MAX, 來看一下什么時(shí)候沒有這個(gè)標(biāo)記

static?int?kvm_guest_time_update(struct?kvm_vcpu?*v) { ......u64?tsc_timestamp,?host_tsc;struct?kvm_arch?*ka?=?&v->kvm->arch;u8?pvclock_flags;bool?use_master_clock;......use_master_clock?=?ka->use_master_clock;......if?(use_master_clock)pvclock_flags?|=?PVCLOCK_TSC_STABLE_BIT; } /***?Assuming?a?stable?TSC?across?physical?CPUS,?and?a?stable?TSC*?across?virtual?CPUs,?the?following?condition?is?possible.*?Each?numbered?line?represents?an?event?visible?to?both*?CPUs?at?the?next?numbered?event.*/ static?void?pvclock_update_vm_gtod_copy(struct?kvm?*kvm) { ......ka->use_master_clock?=?host_tsc_clocksource?&&?vcpus_matched&&?!ka->backwards_tsc_observed&&?!ka->boot_vcpu_runs_old_kvmclock; ...... }

也就是說,如果宿主機(jī)使用了 tsc clocksource, 并且沒有觀察到時(shí)鐘回退現(xiàn)象,那么就設(shè)置 use_master_clock 為 true, 否則為 false.

所以問題來了,我們這臺(tái)機(jī)器是機(jī)器學(xué)習(xí) aws p3.2xlarge, 懷疑是和宿主機(jī)有關(guān),試了下其它 c5 系列的都己經(jīng)不支持 xen clocksource 了(僅支持 tsc kvm-clock acpi_pm),同時(shí) kvm-clock 源測(cè)試也支持 vdso, 參考 官方玩轉(zhuǎn)GPU實(shí)例 blog[6], 最新的虛擬化技術(shù) Nitro 己經(jīng)沒有這個(gè)問題了。

分析來分析去,我可能分析個(gè)寂寞。。。

修復(fù)

當(dāng)然對(duì)于老的硬件,或是內(nèi)核還是有必要修復(fù)的

~#?cat?/sys/devices/system/clocksource/clocksource0/available_clocksource xen?tsc?hpet?acpi_pm ~#?cat?/sys/devices/system/clocksource/clocksource0/current_clocksource xen

查看當(dāng)前時(shí)鐘源是 xen, 只需要將 tsc 寫入即可。

~#?echo?tsc?>?/sys/devices/system/clocksource/clocksource0/available_clocksource

但是還有種情況,就是內(nèi)核將 tsc 標(biāo)記為不可信 Clocksource tsc unstable, 這時(shí)只能重啟內(nèi)核了。或是在啟動(dòng)內(nèi)核時(shí),指定 tsc=reliable, 參考 manage-ec2-linux-clock-source[7]

GRUB_CMDLINE_LINUX="console=tty0?crashkernel=auto?console=ttyS0,115200?clocksource=tsc?tsc=reliable"

然后用 grub2-mkconfig -o /boot/grub2/grub.cfg 生成 grub.cfg 配置文件

小結(jié)

這次分享就這些,以后面還會(huì)分享更多的內(nèi)容,如果感興趣,可以關(guān)注并點(diǎn)擊左下角的分享轉(zhuǎn)發(fā)哦(:

參考資料

[1]

[Go] Time.Now函數(shù)CPU使用率異常: https://mp.weixin.qq.com/s/D2ulLXDFpi0FwVRwSQJ0nA,

[2]

Two frequently used system calls are ~77% slower on AWS EC2: https://blog.packagecloud.io/eng/2017/03/08/system-calls-are-much-slower-on-ec2/,

[3]

vdso man7: https://man7.org/linux/man-pages/man7/vdso.7.html,

[4]

linux insides: https://0xax.gitbooks.io/linux-insides/content/Timers/linux-timers-2.html,

[5]

linux內(nèi)核中的定時(shí)器和時(shí)間管理: https://garlicspace.com/2020/06/07/linux%E5%86%85%E6%A0%B8%E4%B8%AD%E7%9A%84%E5%AE%9A%E6%97%B6%E5%99%A8%E5%92%8C%E6%97%B6%E9%97%B4%E7%AE%A1%E7%90%86-part-7/,

[6]

官方玩轉(zhuǎn)GPU實(shí)例 blog: https://aws.amazon.com/cn/blogs/china/using-rekognition-realize-serverless-intelligent-album-playing-with-gpu-instance-iii-system-optimization/,

[7]

manage-ec2-linux-clock-source: https://aws.amazon.com/premiumsupport/knowledge-center/manage-ec2-linux-clock-source/,

總結(jié)

以上是生活随笔為你收集整理的时钟源为什么会影响性能的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。