linux内核那些事之内存规整(memory compact)
內(nèi)存規(guī)整
內(nèi)存規(guī)整是Mel Gormal開(kāi)發(fā)防止內(nèi)存碎片anti-fragmen pach補(bǔ)丁的第二個(gè)部分Avoiding fragmentation with page clustering v27 [LWN.net],主要用于解決當(dāng)系統(tǒng)長(zhǎng)時(shí)間運(yùn)行之后,造成比較碎化內(nèi)存時(shí),通過(guò)內(nèi)存規(guī)整將處于可以移動(dòng)MOVE的類型內(nèi)存,重新進(jìn)行頁(yè)遷移 整合出較大連續(xù)物理內(nèi)存:
內(nèi)存規(guī)整機(jī)制原因比較簡(jiǎn)單:
- 利用類似快慢指針技巧,當(dāng)需要對(duì)一個(gè)zone進(jìn)行內(nèi)存規(guī)整時(shí),使用migrate_pfn和free_pfn兩個(gè)遍歷;
- migrate_pfn 為從zone 頭部開(kāi)始進(jìn)行掃描,依次掃描出已經(jīng)被分配出去但是可以進(jìn)行頁(yè)遷移的頁(yè)面
- free_pfn:為從zone 尾部開(kāi)始掃描,依次掃描出空閑page。
- 當(dāng)每次掃描結(jié)束后,將migrate_pfn掃描出的可遷移頁(yè)面 依次遷移到free_pfn 空閑page中。
- 當(dāng)free_pfn和migrate_pfn 兩個(gè)遇見(jiàn)相等時(shí)說(shuō)明內(nèi)存規(guī)整完畢。
- 這樣規(guī)整之后,zone前半部分可以需要遷移的頁(yè)面被遷移到zone后半部分空閑page中, 這樣前半部分會(huì)空閑出大塊連續(xù)物理內(nèi)存,供下次申請(qǐng)內(nèi)存使用。
內(nèi)存規(guī)整技術(shù)是頁(yè)遷移技術(shù)的一個(gè)比較重要的使用場(chǎng)景,幫助系統(tǒng)整理出連續(xù)物理內(nèi)存。
觸發(fā)時(shí)機(jī)
內(nèi)存規(guī)整觸發(fā)時(shí)機(jī)主要有以下三種:
- 通過(guò)/proc/sys/vm/compact_memory 由用戶根據(jù)手動(dòng)觸發(fā),如果是NUMA系統(tǒng)則還可以通過(guò)/sys/devices/system/node/node<id>/compact 觸發(fā)
- kcompatd線程類似與kswapd線程,內(nèi)存水位不夠時(shí),會(huì)觸發(fā)kcompactd線程進(jìn)行異步內(nèi)存規(guī)整。
- 慢速申請(qǐng)內(nèi)存通道,說(shuō)明內(nèi)存壓力過(guò)大,__alloc_pages_slowpath會(huì)通過(guò)__alloc_pages_direct_compact 進(jìn)行同步內(nèi)存規(guī)整。
相關(guān)數(shù)據(jù)結(jié)構(gòu)
由于內(nèi)存規(guī)整是以zone為單位進(jìn)行掃描,因此不像kswapd由于pgdata中相關(guān)數(shù)據(jù),內(nèi)存規(guī)整主要涉及到struct? zone中數(shù)據(jù)用于記錄內(nèi)存規(guī)整的進(jìn)度:
struct zone {... ...#if defined CONFIG_COMPACTION || defined CONFIG_CMAunsigned long compact_cached_free_pfn; //用于記錄從尾部開(kāi)始掃描的空閑page的位置unsigned long compact_cached_migrate_pfn[2];//該數(shù)組用于控制異步和同步兩種memory compact場(chǎng)景所從頭部開(kāi)始掃描的頁(yè)遷移位置unsigned long compact_init_migrate_pfn; //內(nèi)存規(guī)整頁(yè)遷移起始地址unsigned long compact_init_free_pfn; //內(nèi)存規(guī)整的空閑free起始地址 #endif} ____cacheline_internodealigned_in_smp;struct compact_control
struct compact_control結(jié)構(gòu)類似與kswapd中的struct scan_control, 該結(jié)構(gòu)主要用于內(nèi)存規(guī)整時(shí)內(nèi)部使用的數(shù)據(jù)結(jié)構(gòu),同時(shí)還可以控制內(nèi)存規(guī)整起始位置,以及策略等。
struct compact_control {struct list_head freepages; /* List of free pages to migrate to */struct list_head migratepages; /* List of pages being migrated */unsigned int nr_freepages; /* Number of isolated free pages */unsigned int nr_migratepages; /* Number of pages to migrate */unsigned long free_pfn; /* isolate_freepages search base */unsigned long migrate_pfn; /* isolate_migratepages search base */unsigned long fast_start_pfn; /* a pfn to start linear scan from */struct zone *zone;unsigned long total_migrate_scanned;unsigned long total_free_scanned;unsigned short fast_search_fail;/* failures to use free list searches */short search_order; /* order to start a fast search at */const gfp_t gfp_mask; /* gfp mask of a direct compactor */int order; /* order a direct compactor needs */int migratetype; /* migratetype of direct compactor */const unsigned int alloc_flags; /* alloc flags of a direct compactor */const int highest_zoneidx; /* zone index of a direct compactor */enum migrate_mode mode; /* Async or sync migration mode */bool ignore_skip_hint; /* Scan blocks even if marked skip */bool no_set_skip_hint; /* Don't mark blocks for skipping */bool ignore_block_suitable; /* Scan blocks considered unsuitable */bool direct_compaction; /* False from kcompactd or /proc/... */bool whole_zone; /* Whole zone should/has been scanned */bool contended; /* Signal lock or sched contention */bool rescan; /* Rescanning the same pageblock */bool alloc_contig; /* alloc_contig_range allocation */ };主要成員說(shuō)明:
- struct list_head freepages: 空閑頁(yè)鏈表,表明頁(yè)面要遷移到目的頁(yè)即空閑頁(yè)鏈表,處于該鏈表中的空閑頁(yè),被isolate孤立出來(lái),防止同時(shí)被buddy給其他進(jìn)程使用。
- struct list_head migratepages:所要遷移的頁(yè)面鏈表,用于記錄本次所需要遷移的頁(yè)面,處于該鏈表中的空閑頁(yè),被isolate孤立出來(lái),防止頁(yè)面被swap out到磁盤(pán)或者page cache被釋放等場(chǎng)景。
- unsigned int nr_freepages: 記錄freepages中有多少個(gè)空閑頁(yè) 被isolate孤立出來(lái)。
- unsigned int nr_migratepages: 記錄migratepages中有多少個(gè)頁(yè)面要進(jìn)行遷移,并被isolate孤立出來(lái)
- unsigned long free_pfn: 從尾部開(kāi)始掃描的空閑起始頁(yè)幀號(hào),即本次掃描zone,從尾部開(kāi)始掃描尋找空閑頁(yè)的起始位置。
- unsigned long migrate_pfn:本次掃描,從頭部往尾部開(kāi)始掃描的 起始位置,從該位置開(kāi)始尋找符合要求的頁(yè)進(jìn)行遷移。
- unsigned long fast_start_pfn:用于快速線性掃描的起始位置
- struct zone *zone:所要掃描的zone
- unsigned long total_migrate_scanned:已經(jīng)掃描并做頁(yè)遷移的頁(yè)數(shù)目
- unsigned long total_free_scanned:已經(jīng)掃描用用作空閑頁(yè) 作為頁(yè)前面目的的數(shù)目
- const gfp_t gfp_mask: gfp mask
- short search_order: 掃描是開(kāi)始的order 即一次性做頁(yè)遷移的數(shù)目
- int order: 掃描時(shí)所需要的至少order 頁(yè)數(shù)目。
- int migratetype:頁(yè)遷移類型
- const int highest_zoneidx: 最高zone,掃描的zone范圍‘
- enum migrate_mode mode: 是同步還是異步模式,即是通過(guò)kcompact線程進(jìn)行內(nèi)存規(guī)整,還是通過(guò)直接方式進(jìn)行內(nèi)存規(guī)整
- bool direct_compaction: 如果為false,則是通過(guò)kscompact線程 或者??/proc手動(dòng)觸發(fā)觸發(fā)
- bool whole_zone:是否一次性掃描整個(gè)zone.
compact_zone()
compact_zone()函數(shù)是實(shí)施內(nèi)存規(guī)整的核心函數(shù),不管是哪種觸發(fā)方式最終都會(huì)通過(guò)compact_zone 規(guī)整指定的zone進(jìn)行內(nèi)存規(guī)整,
調(diào)用關(guān)系
按照compact_zone觸發(fā)關(guān)系調(diào)用關(guān)系圖如下:
上述四種觸發(fā)方式,最終都是靠compact_zone實(shí)現(xiàn)頁(yè)遷移功能
compact_zone源碼
static enum compact_result compact_zone(struct compact_control *cc, struct capture_control *capc) {enum compact_result ret;unsigned long start_pfn = cc->zone->zone_start_pfn;unsigned long end_pfn = zone_end_pfn(cc->zone);unsigned long last_migrated_pfn;const bool sync = cc->mode != MIGRATE_ASYNC;bool update_cached;/** These counters track activities during zone compaction. Initialize* them before compacting a new zone.*/cc->total_migrate_scanned = 0;cc->total_free_scanned = 0;cc->nr_migratepages = 0;cc->nr_freepages = 0;INIT_LIST_HEAD(&cc->freepages);INIT_LIST_HEAD(&cc->migratepages);cc->migratetype = gfp_migratetype(cc->gfp_mask);ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags,cc->highest_zoneidx);/* Compaction is likely to fail */if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)return ret;/* huh, compaction_suitable is returning something unexpected */VM_BUG_ON(ret != COMPACT_CONTINUE);/** Clear pageblock skip if there were failures recently and compaction* is about to be retried after being deferred.*/if (compaction_restarting(cc->zone, cc->order))__reset_isolation_suitable(cc->zone);/** Setup to move all movable pages to the end of the zone. Used cached* information on where the scanners should start (unless we explicitly* want to compact the whole zone), but check that it is initialised* by ensuring the values are within zone boundaries.*/cc->fast_start_pfn = 0;if (cc->whole_zone) {cc->migrate_pfn = start_pfn;cc->free_pfn = pageblock_start_pfn(end_pfn - 1);} else {cc->migrate_pfn = cc->zone->compact_cached_migrate_pfn[sync];cc->free_pfn = cc->zone->compact_cached_free_pfn;if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {cc->free_pfn = pageblock_start_pfn(end_pfn - 1);cc->zone->compact_cached_free_pfn = cc->free_pfn;}if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {cc->migrate_pfn = start_pfn;cc->zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;cc->zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;}if (cc->migrate_pfn <= cc->zone->compact_init_migrate_pfn)cc->whole_zone = true;}last_migrated_pfn = 0;/** Migrate has separate cached PFNs for ASYNC and SYNC* migration on* the basis that some migrations will fail in ASYNC mode. However,* if the cached PFNs match and pageblocks are skipped due to having* no isolation candidates, then the sync state does not matter.* Until a pageblock with isolation candidates is found, keep the* cached PFNs in sync to avoid revisiting the same blocks.*/update_cached = !sync &&cc->zone->compact_cached_migrate_pfn[0] == cc->zone->compact_cached_migrate_pfn[1];trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,cc->free_pfn, end_pfn, sync);migrate_prep_local();while ((ret = compact_finished(cc)) == COMPACT_CONTINUE) {int err;unsigned long start_pfn = cc->migrate_pfn;/** Avoid multiple rescans which can happen if a page cannot be* isolated (dirty/writeback in async mode) or if the migrated* pages are being allocated before the pageblock is cleared.* The first rescan will capture the entire pageblock for* migration. If it fails, it'll be marked skip and scanning* will proceed as normal.*/cc->rescan = false;if (pageblock_start_pfn(last_migrated_pfn) ==pageblock_start_pfn(start_pfn)) {cc->rescan = true;}switch (isolate_migratepages(cc)) {case ISOLATE_ABORT:ret = COMPACT_CONTENDED;putback_movable_pages(&cc->migratepages);cc->nr_migratepages = 0;goto out;case ISOLATE_NONE:if (update_cached) {cc->zone->compact_cached_migrate_pfn[1] =cc->zone->compact_cached_migrate_pfn[0];}/** We haven't isolated and migrated anything, but* there might still be unflushed migrations from* previous cc->order aligned block.*/goto check_drain;case ISOLATE_SUCCESS:update_cached = false;last_migrated_pfn = start_pfn;;}err = migrate_pages(&cc->migratepages, compaction_alloc,compaction_free, (unsigned long)cc, cc->mode,MR_COMPACTION);trace_mm_compaction_migratepages(cc->nr_migratepages, err,&cc->migratepages);/* All pages were either migrated or will be released */cc->nr_migratepages = 0;if (err) {putback_movable_pages(&cc->migratepages);/** migrate_pages() may return -ENOMEM when scanners meet* and we want compact_finished() to detect it*/if (err == -ENOMEM && !compact_scanners_met(cc)) {ret = COMPACT_CONTENDED;goto out;}/** We failed to migrate at least one page in the current* order-aligned block, so skip the rest of it.*/if (cc->direct_compaction &&(cc->mode == MIGRATE_ASYNC)) {cc->migrate_pfn = block_end_pfn(cc->migrate_pfn - 1, cc->order);/* Draining pcplists is useless in this case */last_migrated_pfn = 0;}}check_drain:/** Has the migration scanner moved away from the previous* cc->order aligned block where we migrated from? If yes,* flush the pages that were freed, so that they can merge and* compact_finished() can detect immediately if allocation* would succeed.*/if (cc->order > 0 && last_migrated_pfn) {unsigned long current_block_start =block_start_pfn(cc->migrate_pfn, cc->order);if (last_migrated_pfn < current_block_start) {lru_add_drain_cpu_zone(cc->zone);/* No more flushing until we migrate again */last_migrated_pfn = 0;}}/* Stop if a page has been captured */if (capc && capc->page) {ret = COMPACT_SUCCESS;break;}}out:/** Release free pages and update where the free scanner should restart,* so we don't leave any returned pages behind in the next attempt.*/if (cc->nr_freepages > 0) {unsigned long free_pfn = release_freepages(&cc->freepages);cc->nr_freepages = 0;VM_BUG_ON(free_pfn == 0);/* The cached pfn is always the first in a pageblock */free_pfn = pageblock_start_pfn(free_pfn);/** Only go back, not forward. The cached pfn might have been* already reset to zone end in compact_finished()*/if (free_pfn > cc->zone->compact_cached_free_pfn)cc->zone->compact_cached_free_pfn = free_pfn;}count_compact_events(COMPACTMIGRATE_SCANNED, cc->total_migrate_scanned);count_compact_events(COMPACTFREE_SCANNED, cc->total_free_scanned);trace_mm_compaction_end(start_pfn, cc->migrate_pfn,cc->free_pfn, end_pfn, sync, ret);return ret; }compact_zone流程
該函數(shù)整理處理思路相對(duì)比較清晰:
- ?按照指定的zone進(jìn)行內(nèi)存規(guī)整
- 對(duì)compact_control結(jié)構(gòu)中的一些值進(jìn)行初始化
- compaction_suitable:根據(jù)實(shí)際內(nèi)存水位情況判斷是否有必要做內(nèi)存規(guī)整,因?yàn)閮?nèi)存規(guī)整操作比較耗時(shí),如果空閑內(nèi)存處于較高情況,則沒(méi)有必要觸發(fā)內(nèi)存規(guī)整,最終是通過(guò)__compaction_suitable實(shí)現(xiàn)對(duì)內(nèi)存水位判斷
- compact_finished: 用于處理當(dāng)前zone是否掃描完畢,zone掃描時(shí)使用了處理上的技巧。內(nèi)存規(guī)整時(shí)分別從zone頭部掃描migrate_pfn和從zone尾部掃描free_pfn, 當(dāng)migrate_pfn與free_pfn相遇時(shí),則認(rèn)為掃描完畢。最終是調(diào)用__compact_finished函數(shù)
- isolate_migratepages: 將掃描出的要遷移的頁(yè)進(jìn)行孤立isolate,防止在遷移過(guò)程中,該頁(yè)面被釋放 或者swap out等,也是做頁(yè)遷移之前必須準(zhǔn)備工作,可以詳見(jiàn)《linux那些事之頁(yè)遷移(page migratiom)》
- migrate_pages:將孤立出來(lái)的頁(yè)面進(jìn)行頁(yè)遷移。
- 當(dāng)zone掃描內(nèi)存規(guī)整完畢之后,需要free_pfn 等信息保存到zone中 用于記錄本次掃描位置,空閑頁(yè)等信息,以便用于下一次內(nèi)存規(guī)整時(shí)使用。
- compact_result 為內(nèi)存規(guī)整結(jié)果
??????????????關(guān)鍵幾個(gè)函數(shù)
__compaction_suitable
__compaction_suitable用于判斷當(dāng)前zone是否可以做內(nèi)存規(guī)整:
/** compaction_suitable: Is this suitable to run compaction on this zone now?* Returns* COMPACT_SKIPPED - If there are too few free pages for compaction* COMPACT_SUCCESS - If the allocation would succeed without compaction* COMPACT_CONTINUE - If compaction should run now*/ static enum compact_result __compaction_suitable(struct zone *zone, int order,unsigned int alloc_flags,int highest_zoneidx,unsigned long wmark_target) {unsigned long watermark;if (is_via_compact_memory(order))return COMPACT_CONTINUE;watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);/** If watermarks for high-order allocation are already met, there* should be no need for compaction at all.*/if (zone_watermark_ok(zone, order, watermark, highest_zoneidx,alloc_flags))return COMPACT_SUCCESS;/** Watermarks for order-0 must be met for compaction to be able to* isolate free pages for migration targets. This means that the* watermark and alloc_flags have to match, or be more pessimistic than* the check in __isolate_free_page(). We don't use the direct* compactor's alloc_flags, as they are not relevant for freepage* isolation. We however do use the direct compactor's highest_zoneidx* to skip over zones where lowmem reserves would prevent allocation* even if compaction succeeds.* For costly orders, we require low watermark instead of min for* compaction to proceed to increase its chances.* ALLOC_CMA is used, as pages in CMA pageblocks are considered* suitable migration targets*/watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?low_wmark_pages(zone) : min_wmark_pages(zone);watermark += compact_gap(order);if (!__zone_watermark_ok(zone, 0, watermark, highest_zoneidx,ALLOC_CMA, wmark_target))return COMPACT_SKIPPED;return COMPACT_CONTINUE; }主要有以下幾種情況:
- is_via_compact_memory: 是否通過(guò)/proc/sys/vm/compact_memory 手動(dòng)強(qiáng)制進(jìn)行內(nèi)存規(guī)整,如果強(qiáng)制內(nèi)存規(guī)整,則直接返回COMPACT_CONTINUE,繼續(xù)后續(xù)步驟進(jìn)行內(nèi)存規(guī)整
- zone_watermark_ok: 當(dāng)作內(nèi)存water mark滿足order,則沒(méi)有必要做內(nèi)存規(guī)整,能夠申請(qǐng)oder 內(nèi)存成功,返回COMPACT_SUCCESS
- 當(dāng)內(nèi)存不滿足時(shí),則查看是否可以通過(guò)內(nèi)存壓縮滿足內(nèi)存申請(qǐng),如果能夠通過(guò)內(nèi)存規(guī)整滿足,則返回COMPACT_SUCCESS進(jìn)行內(nèi)存規(guī)整,如果判斷出當(dāng)前內(nèi)存不能夠通過(guò)內(nèi)存壓縮滿足分配要求,則說(shuō)明內(nèi)存碎片化不嚴(yán)重,沒(méi)有必要進(jìn)行內(nèi)存壓縮發(fā)跳過(guò)此次內(nèi)存規(guī)整操作,返回?COMPACT_SKIPPED。
compact_result
compact_result為此次內(nèi)存規(guī)整之后的結(jié)果:
/* Return values for compact_zone() and try_to_compact_pages() */ /* When adding new states, please adjust include/trace/events/compaction.h */ enum compact_result {/* For more detailed tracepoint output - internal to compaction */COMPACT_NOT_SUITABLE_ZONE,/** compaction didn't start as it was not possible or direct reclaim* was more suitable*/COMPACT_SKIPPED,/* compaction didn't start as it was deferred due to past failures */COMPACT_DEFERRED,/* compaction not active last round */COMPACT_INACTIVE = COMPACT_DEFERRED,/* For more detailed tracepoint output - internal to compaction */COMPACT_NO_SUITABLE_PAGE,/* compaction should continue to another pageblock */COMPACT_CONTINUE,/** The full zone was compacted scanned but wasn't successfull to compact* suitable pages.*/COMPACT_COMPLETE,/** direct compaction has scanned part of the zone but wasn't successfull* to compact suitable pages.*/COMPACT_PARTIAL_SKIPPED,/* compaction terminated prematurely due to lock contentions */COMPACT_CONTENDED,/** direct compaction terminated after concluding that the allocation* should now succeed*/COMPACT_SUCCESS, };- ?COMPACT_NOT_SUITABLE_ZONE:該zone不適合做內(nèi)存規(guī)整
- COMPACT_SKIPPED:內(nèi)存規(guī)整要求不滿足,跳過(guò)該zone
- COMPACT_DEFERRED:由于之前的一些錯(cuò)誤導(dǎo)致內(nèi)存規(guī)整退出
- COMPACT_INACTIVE:上次內(nèi)存規(guī)整未激活
- COMPACT_NO_SUITABLE_PAGE:沒(méi)有合適的物理頁(yè)做內(nèi)存規(guī)整
- COMPACT_CONTINUE:下一個(gè)頁(yè)面塊pageblock繼續(xù)做內(nèi)存規(guī)整
- COMPACT_COMPLETE:zone都以及做內(nèi)存規(guī)整掃描完畢,但是沒(méi)有合適頁(yè)面做內(nèi)存規(guī)整
- COMPACT_PARTIAL_SKIPPED:直接內(nèi)存規(guī)整已經(jīng)掃描了部分zone頁(yè)面,但是仍然沒(méi)有合適頁(yè)面做內(nèi)存規(guī)整
- COMPACT_CONTENDED:由于某些鎖競(jìng)爭(zhēng)導(dǎo)致內(nèi)存規(guī)整退出
- COMPACT_SUCCESS:當(dāng)前zone內(nèi)存規(guī)整滿足頁(yè)面分配要求,可以退出
__compact_finished
__compact_finished用于判斷當(dāng)前內(nèi)存壓縮掃描是否完成:
static enum compact_result __compact_finished(struct compact_control *cc) {unsigned int order;const int migratetype = cc->migratetype;int ret;/* Compaction run completes if the migrate and free scanner meet */if (compact_scanners_met(cc)) {/* Let the next compaction start anew. */reset_cached_positions(cc->zone);/** Mark that the PG_migrate_skip information should be cleared* by kswapd when it goes to sleep. kcompactd does not set the* flag itself as the decision to be clear should be directly* based on an allocation request.*/if (cc->direct_compaction)cc->zone->compact_blockskip_flush = true;if (cc->whole_zone)return COMPACT_COMPLETE;elsereturn COMPACT_PARTIAL_SKIPPED;}if (is_via_compact_memory(cc->order))return COMPACT_CONTINUE;/** Always finish scanning a pageblock to reduce the possibility of* fallbacks in the future. This is particularly important when* migration source is unmovable/reclaimable but it's not worth* special casing.*/if (!IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))return COMPACT_CONTINUE;/* Direct compactor: Is a suitable page free? */ret = COMPACT_NO_SUITABLE_PAGE;for (order = cc->order; order < MAX_ORDER; order++) {struct free_area *area = &cc->zone->free_area[order];bool can_steal;/* Job done if page is free of the right migratetype */if (!free_area_empty(area, migratetype))return COMPACT_SUCCESS;#ifdef CONFIG_CMA/* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */if (migratetype == MIGRATE_MOVABLE &&!free_area_empty(area, MIGRATE_CMA))return COMPACT_SUCCESS; #endif/** Job done if allocation would steal freepages from* other migratetype buddy lists.*/if (find_suitable_fallback(area, order, migratetype,true, &can_steal) != -1) {/* movable pages are OK in any pageblock */if (migratetype == MIGRATE_MOVABLE)return COMPACT_SUCCESS;/** We are stealing for a non-movable allocation. Make* sure we finish compacting the current pageblock* first so it is as free as possible and we won't* have to steal another one soon. This only applies* to sync compaction, as async compaction operates* on pageblocks of the same migratetype.*/if (cc->mode == MIGRATE_ASYNC ||IS_ALIGNED(cc->migrate_pfn,pageblock_nr_pages)) {return COMPACT_SUCCESS;}ret = COMPACT_CONTINUE;break;}}if (cc->contended || fatal_signal_pending(current))ret = COMPACT_CONTENDED;return ret; }- compact_scanners_met: cc->free_pfn和cc->migrate_pfn是否相遇,如果說(shuō)明已經(jīng)掃描完畢,返回true
- compact_scanners_met 返回true,如果cc->whole_zone為true說(shuō)明需要掃描整個(gè)zone,并且已經(jīng)掃描完畢了,返回COMPACT_COMPLETE。如果ccf->whole_zone為false,說(shuō)明不需要掃描整個(gè)zone,但是此時(shí)已經(jīng)掃描完整個(gè)zone,說(shuō)明掃描完整個(gè)zone還沒(méi)有滿足后續(xù)內(nèi)存分配要求,返回COMPACT_PARTIAL_SKIPPED。
- compact_scanners_met 返回false, is_via_compact_memory()為ture說(shuō)明是/proc/sys/vm/compact_memory手動(dòng)觸發(fā),需要繼續(xù)掃描zone下一個(gè)pageblock,返回COMPACT_CONTINUE。
- 如果cc->migrate_pfn 不是pageblock對(duì)齊,則需要繼續(xù)掃描
- 接下來(lái)一個(gè)比較長(zhǎng)的循環(huán)出來(lái),主要是需要判斷后續(xù)掃描是否還有空閑page 用來(lái)做遷移目的頁(yè),如果有則可以繼續(xù)做遷移,如果沒(méi)有則返回COMPACT_NO_SUITABLE_PAGE
isolate_migratepages
isolate_migratepages函數(shù)處理比較長(zhǎng),整體處理思路就是繼續(xù)掃描,將掃描出合適的物理頁(yè)做遷移,并將需要做遷移的頁(yè)孤立出來(lái),加入到cc->migratepages中:
static isolate_migrate_t isolate_migratepages(struct compact_control *cc) {unsigned long block_start_pfn;unsigned long block_end_pfn;unsigned long low_pfn;struct page *page;const isolate_mode_t isolate_mode =(sysctl_compact_unevictable_allowed ? ISOLATE_UNEVICTABLE : 0) |(cc->mode != MIGRATE_SYNC ? ISOLATE_ASYNC_MIGRATE : 0);bool fast_find_block;//通過(guò)快速通道找到所要遷移掃描的起始pfnlow_pfn = fast_find_migrateblock(cc);//計(jì)算出掃描結(jié)束page pfn,要求按照pageblock對(duì)齊block_start_pfn = pageblock_start_pfn(low_pfn);if (block_start_pfn < cc->zone->zone_start_pfn)block_start_pfn = cc->zone->zone_start_pfn;//快速找到的pfn是否成功fast_find_block = low_pfn != cc->migrate_pfn && !cc->fast_search_fail;/* Only scan within a pageblock boundary */block_end_pfn = pageblock_end_pfn(low_pfn);//對(duì)low_pfn起始的pageblock內(nèi)物理頁(yè)面做掃描,將合適的做遷移的頁(yè)面孤立出來(lái)for (; block_end_pfn <= cc->free_pfn;fast_find_block = false,low_pfn = block_end_pfn,block_start_pfn = block_end_pfn,block_end_pfn += pageblock_nr_pages) {//如果長(zhǎng)時(shí)間循環(huán),需要進(jìn)行放權(quán),給其他進(jìn)程得到調(diào)度機(jī)會(huì)if (!(low_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages)))cond_resched();//檢查是否在同一個(gè)pageblock內(nèi)page = pageblock_pfn_to_page(block_start_pfn,block_end_pfn, cc->zone);if (!page)continue;//如果最近孤立頁(yè)面失敗,則不進(jìn)行再次嘗試,僅僅做檢查if (IS_ALIGNED(low_pfn, pageblock_nr_pages) &&!fast_find_block && !isolation_suitable(cc, page))continue;//如果是異步規(guī)整,則僅規(guī)整move頁(yè)面且非huge pageif (!suitable_migration_source(cc, page)) {update_cached_migrate(cc, block_end_pfn);continue;}/* Perform the isolation *///將[low_pfn,block_end_pfn)中的符合內(nèi)存規(guī)整要求的頁(yè)面孤立出來(lái),將孤立出來(lái)的頁(yè)面加入到cc->migratepages中l(wèi)ow_pfn = isolate_migratepages_block(cc, low_pfn,block_end_pfn, isolate_mode);if (!low_pfn)return ISOLATE_ABORT;//孤立頁(yè)面成功或者失敗都不在繼續(xù)break;}//記錄下次重新掃描的做頁(yè)遷移的起始pfncc->migrate_pfn = low_pfn;return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE; }migrate_pages
將上述合適做頁(yè)遷移的頁(yè)面,且已經(jīng)孤立到cc->migratepages中頁(yè)面做頁(yè)遷移:
err = migrate_pages(&cc->migratepages, compaction_alloc,compaction_free, (unsigned long)cc, cc->mode,MR_COMPACTION)關(guān)于頁(yè)遷移詳細(xì)信息可以參考《linux那些事之頁(yè)遷移(page migratiom)》
總結(jié)
以上是生活随笔為你收集整理的linux内核那些事之内存规整(memory compact)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 初学3D制作,先学C4D还是Blende
- 下一篇: Linux练习题