當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

17 内存规整(memory compaction)

發布時間：2023/12/8 编程问答 43 豆豆

生活随笔收集整理的這篇文章主要介紹了 17 内存规整(memory compaction) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

? ? 伙伴系統以頁為單位來管理內存，內存碎片也是基于頁面的，即有大量離散且不連續的頁面導致的。從內存角度來看，內存碎片不是好事情，有些情況下物理設備需要大段的連續的物理內存，如果內核無法滿足，則會發生內核panic。內存碎片化好比軍訓中帶隊行走時間長了，隊列亂了，需要重新規整一下，因此本章稱為內存規整，一些文獻稱為內存緊湊，它是為了解決內存碎片化而出現的一個功能。

? ? 內核中去碎片化的基本原理是按照頁的可移動性將頁面分組。遷移內核本身使用的物理內存的實現難度和復雜度都很大，因此目前的內核是不遷移內核本身使用的物理頁面。對于應用程序進程使用的頁面，實際上通過用戶頁表的映射來訪問。用戶頁表可以移動和修改映射關系，不會影響用戶進程，因此內存規整是基于頁面遷移實現的。

內存規整實現：

? ? 內存規整的一個重要的應用場景是在分配大塊內存時(order > 1),在WMARK_LOW低水位情況下分配失敗，喚醒kswapd內核線程后依然無法分配出內存，這時調用__alloc_pages_direct_compact()來壓縮內存嘗試分配出所需要的內存。下面沿著alloc_pages()->...->__alloc_pages_direct_compact()這條內核路徑來看內存規整是如何工作的。

[mm/page_alloc.c]

[alloc_pages()->alloc_pages_node()->__alloc_pages()->__alloc_pages_nodemask()->__alloc_pages_slowpath()->__alloc_pages_direct_compact()]

/* Try memory compaction for high-order allocations before reclaim */ /*參數mode指migration_mode，通常由__alloc_pages_slowpath()傳遞過來，其值為MIGRATE_ASYNC*/ static struct page * __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,int alloc_flags, const struct alloc_context *ac,enum migrate_mode mode, int *contended_compaction,bool *deferred_compaction) {unsigned long compact_result;struct page *page;/*內存規整是針對high-order的內存分配，所以order等于0的情況不需要觸發內存規整。*/if (!order)return NULL;current->flags |= PF_MEMALLOC;/*try_to_compact_pages()函數執行時需要設置當前進程的PF_MEMALLOC標志位，該標志位會在頁面遷移時用到，避免頁面鎖(PG_Locked)發生死鎖，下面查看此函數實現*/compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,mode, contended_compaction);current->flags &= ~PF_MEMALLOC;switch (compact_result) {case COMPACT_DEFERRED:*deferred_compaction = true;/* fall-through */case COMPACT_SKIPPED:return NULL;default:break;}/** At least in one zone compaction wasn't deferred or skipped, so let's* count a compaction stall*/count_vm_event(COMPACTSTALL);/*當內存規整執行完成后，調用get_page_from_freelist()嘗試分配內存，如果分配成功將返回首頁page數據結構*/page = get_page_from_freelist(gfp_mask, order,alloc_flags & ~ALLOC_NO_WATERMARKS, ac);if (page) {struct zone *zone = page_zone(page);zone->compact_blockskip_flush = false;compaction_defer_reset(zone, order, true);count_vm_event(COMPACTSUCCESS);return page;}/** It's bad if compaction run occurs and fails. The most likely reason* is that pages exist, but not enough to satisfy watermarks.*/count_vm_event(COMPACTFAIL);cond_resched();return NULL; }

try_to_compact_pages()函數實現：

[__alloc_pages_direct_compact()->try_to_compact_pages()]

/*** try_to_compact_pages - Direct compact to satisfy a high-order allocation* @gfp_mask: The GFP mask of the current allocation* @order: The order of the current allocation* @alloc_flags: The allocation flags of the current allocation* @ac: The context of current allocation* @mode: The migration mode for async, sync light, or sync migration* @contended: Return value that determines if compaction was aborted due to* need_resched() or lock contention** This is the main entry point for direct page compaction.*/ unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,int alloc_flags, const struct alloc_context *ac,enum migrate_mode mode, int *contended) {int may_enter_fs = gfp_mask & __GFP_FS;int may_perform_io = gfp_mask & __GFP_IO;struct zoneref *z;struct zone *zone;int rc = COMPACT_DEFERRED;int all_zones_contended = COMPACT_CONTENDED_LOCK; /* init for &= op */*contended = COMPACT_CONTENDED_NONE;/* Check if the GFP flags allow compaction */if (!order || !may_enter_fs || !may_perform_io)return COMPACT_SKIPPED;trace_mm_compaction_try_to_compact_pages(order, gfp_mask, mode);/* Compact each zone in the list *//*for_each_zone_zonelist_nodemask宏，它會根據分配掩碼來確定需要掃描和遍歷哪些zone*/for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,ac->nodemask) {int status;int zone_contended;if (compaction_deferred(zone, order))continue;/*compact_zone_order()對特定zone執行內存規整，下面查看此函數實現*/status = compact_zone_order(zone, order, gfp_mask, mode,&zone_contended, alloc_flags,ac->classzone_idx);rc = max(status, rc);/** It takes at least one zone that wasn't lock contended* to clear all_zones_contended.*/all_zones_contended &= zone_contended;/* If a normal allocation would succeed, stop compacting *//*zone_watermark_ok()判斷zone當前的水位是否高于LOW_WMARK水位，如果是，則退出循環*/if (zone_watermark_ok(zone, order, low_wmark_pages(zone),ac->classzone_idx, alloc_flags)) {/** We think the allocation will succeed in this zone,* but it is not certain, hence the false. The caller* will repeat this with true if allocation indeed* succeeds in this zone.*/compaction_defer_reset(zone, order, false);/** It is possible that async compaction aborted due to* need_resched() and the watermarks were ok thanks to* somebody else freeing memory. The allocation can* however still fail so we better signal the* need_resched() contention anyway (this will not* prevent the allocation attempt).*/if (zone_contended == COMPACT_CONTENDED_SCHED)*contended = COMPACT_CONTENDED_SCHED;goto break_loop;}if (mode != MIGRATE_ASYNC && status == COMPACT_COMPLETE) {/** We think that allocation won't succeed in this zone* so we defer compaction there. If it ends up* succeeding after all, it will be reset.*/defer_compaction(zone, order);}/** We might have stopped compacting due to need_resched() in* async compaction, or due to a fatal signal detected. In that* case do not try further zones and signal need_resched()* contention.*/if ((zone_contended == COMPACT_CONTENDED_SCHED)|| fatal_signal_pending(current)) {*contended = COMPACT_CONTENDED_SCHED;goto break_loop;}continue; break_loop:/** We might not have tried all the zones, so be conservative* and assume they are not all lock contended.*/all_zones_contended = 0;break;}/** If at least one zone wasn't deferred or skipped, we report if all* zones that were tried were lock contended.*/if (rc > COMPACT_SKIPPED && all_zones_contended)*contended = COMPACT_CONTENDED_LOCK;return rc; } 回到__alloc_pages_direct_compact函數

compact_zone_order()函數實現:

[__alloc_pages_direct_compact()->try_to_compact_pages()->compact_zone_order()]

和kswapd的代碼一樣，這里定義了控制相關的數據結構struct compact_control?cc來傳遞參數。cc.migratepages是將要遷移頁面的鏈表，cc.freepages表示要遷移的目的鏈表。

static unsigned long compact_zone_order(struct zone *zone, int order,gfp_t gfp_mask, enum migrate_mode mode, int *contended,int alloc_flags, int classzone_idx) {unsigned long ret;struct compact_control cc = {.nr_freepages = 0,.nr_migratepages = 0,.order = order,.gfp_mask = gfp_mask,.zone = zone,.mode = mode,.alloc_flags = alloc_flags,.classzone_idx = classzone_idx,};INIT_LIST_HEAD(&cc.freepages);INIT_LIST_HEAD(&cc.migratepages);/*下面查看此函數實現*/ret = compact_zone(zone, &cc);VM_BUG_ON(!list_empty(&cc.freepages));VM_BUG_ON(!list_empty(&cc.migratepages));*contended = cc.contended;return ret; } 回到try_to_compact_pages()函數

compact_zone()函數實現:

[alloc_pages()->alloc_pages_node()->__alloc_pages()->__alloc_pages_nodemask()->

__alloc_pages_slowpath()->__alloc_pages_direct_compact()->try_to_compact_pages()->compact_zone_order()->compact_zone()]

static int compact_zone(struct zone *zone, struct compact_control *cc) {int ret;unsigned long start_pfn = zone->zone_start_pfn;unsigned long end_pfn = zone_end_pfn(zone);const int migratetype = gfpflags_to_migratetype(cc->gfp_mask);const bool sync = cc->mode != MIGRATE_ASYNC;unsigned long last_migrated_pfn = 0;/*根據當前水位來判斷是否需要進行內存規整，下面查看此函數實現*/ret = compaction_suitable(zone, cc->order, cc->alloc_flags,cc->classzone_idx);switch (ret) {case COMPACT_PARTIAL:case COMPACT_SKIPPED:/* Compaction is likely to fail */return ret;case COMPACT_CONTINUE:/* Fall through to compaction */;}/** Clear pageblock skip if there were failures recently and compaction* is about to be retried after being deferred. kswapd does not do* this reset as it'll reset the cached information when going to sleep.*/if (compaction_restarting(zone, cc->order) && !current_is_kswapd())__reset_isolation_suitable(zone);/** Setup to move all movable pages to the end of the zone. Used cached* information on where the scanners should start but check that it* is initialised by ensuring the values are within zone boundaries.*//*設置cc->migrate_pfn和cc->free_pfn。簡單來說，cc->migrate_pfn設置為zone的開始pfn(zone->zone_start_pfn)，表示從zone的第一個頁面開始掃描和查找哪些頁面可以遷移。cc->free_pfn設置為zone的最末的pfn，表示從zone的最末端開始掃描和查找有哪些空閑的頁面可以用作遷移頁面目的地。*/cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];cc->free_pfn = zone->compact_cached_free_pfn;if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn) {cc->free_pfn = end_pfn & ~(pageblock_nr_pages-1);zone->compact_cached_free_pfn = cc->free_pfn;}if (cc->migrate_pfn < start_pfn || cc->migrate_pfn > end_pfn) {cc->migrate_pfn = start_pfn;zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;}trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,cc->free_pfn, end_pfn, sync);migrate_prep_local();/*while循環從zone的開頭處去掃描和查找合適的遷移頁面，然后嘗試遷移到zone末端的空閑頁面中，直到zone處于低水位WMARK_LOW之上。compact_finished()判斷compact過程是否可以結束，下面查看此函數的實現*/while ((ret = compact_finished(zone, cc, migratetype)) ==COMPACT_CONTINUE) {int err;unsigned long isolate_start_pfn = cc->migrate_pfn;/*isolate_migratepages()函數掃描并且尋找zone中可遷移的頁面，可遷移的頁面會添加到cc->migratepages鏈表中，下面查看此函數的實現*/switch (isolate_migratepages(zone, cc)) {case ISOLATE_ABORT:ret = COMPACT_PARTIAL;putback_movable_pages(&cc->migratepages);cc->nr_migratepages = 0;goto out;case ISOLATE_NONE:/** We haven't isolated and migrated anything, but* there might still be unflushed migrations from* previous cc->order aligned block.*/goto check_drain;case ISOLATE_SUCCESS:;}/*遷移頁的核心函數，從cc->migratepages鏈表中摘取頁，然后嘗試去遷移頁。下面查看此函數的實現*/err = migrate_pages(&cc->migratepages, compaction_alloc,compaction_free, (unsigned long)cc, cc->mode,MR_COMPACTION);trace_mm_compaction_migratepages(cc->nr_migratepages, err,&cc->migratepages);/* All pages were either migrated or will be released */cc->nr_migratepages = 0;/*處理遷移頁面失敗的情況，沒遷移的頁面會放回適合的LRU鏈表中*/if (err) {putback_movable_pages(&cc->migratepages);/** migrate_pages() may return -ENOMEM when scanners meet* and we want compact_finished() to detect it*/if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {ret = COMPACT_PARTIAL;goto out;}}/** Record where we could have freed pages by migration and not* yet flushed them to buddy allocator. We use the pfn that* isolate_migratepages() started from in this loop iteration* - this is the lowest page that could have been isolated and* then freed by migration.*/if (!last_migrated_pfn)last_migrated_pfn = isolate_start_pfn;check_drain:/** Has the migration scanner moved away from the previous* cc->order aligned block where we migrated from? If yes,* flush the pages that were freed, so that they can merge and* compact_finished() can detect immediately if allocation* would succeed.*/if (cc->order > 0 && last_migrated_pfn) {int cpu;unsigned long current_block_start =cc->migrate_pfn & ~((1UL << cc->order) - 1);if (last_migrated_pfn < current_block_start) {cpu = get_cpu();lru_add_drain_cpu(cpu);drain_local_pages(zone);put_cpu();/* No more flushing until we migrate again */last_migrated_pfn = 0;}}}out:/** Release free pages and update where the free scanner should restart,* so we don't leave any returned pages behind in the next attempt.*/if (cc->nr_freepages > 0) {unsigned long free_pfn = release_freepages(&cc->freepages);cc->nr_freepages = 0;VM_BUG_ON(free_pfn == 0);/* The cached pfn is always the first in a pageblock */free_pfn &= ~(pageblock_nr_pages-1);/** Only go back, not forward. The cached pfn might have been* already reset to zone end in compact_finished()*/if (free_pfn > zone->compact_cached_free_pfn)zone->compact_cached_free_pfn = free_pfn;}trace_mm_compaction_end(start_pfn, cc->migrate_pfn,cc->free_pfn, end_pfn, sync, ret);return ret; } 回到compact_zone_order()函數

compaction_suitable()函數實現：判斷當前水位是否需要內存規整

unsigned long compaction_suitable(struct zone *zone, int order,int alloc_flags, int classzone_idx) {unsigned long ret;ret = __compaction_suitable(zone, order, alloc_flags, classzone_idx);trace_mm_compaction_suitable(zone, order, ret);if (ret == COMPACT_NOT_SUITABLE_ZONE)ret = COMPACT_SKIPPED;return ret; } /** compaction_suitable: Is this suitable to run compaction on this zone now?* Returns* COMPACT_SKIPPED - If there are too few free pages for compaction* COMPACT_PARTIAL - If the allocation would succeed without compaction* COMPACT_CONTINUE - If compaction should run now*/ static unsigned long __compaction_suitable(struct zone *zone, int order,int alloc_flags, int classzone_idx) {int fragindex;unsigned long watermark;/** order == -1 is expected when compacting via* /proc/sys/vm/compact_memory*/if (order == -1)return COMPACT_CONTINUE;/*以低水位WMARK_LOW為判斷標準然后做如下三個判斷*/watermark = low_wmark_pages(zone);/** If watermarks for high-order allocation are already met, there* should be no need for compaction at all.*//*(1) 以分配內存請求的order來判斷zone是否在低水位WMARK_LOW之上，如果是，則返回COMPACT_PARTIAL表示不需要做內存規整*/if (zone_watermark_ok(zone, order, watermark, classzone_idx,alloc_flags))return COMPACT_PARTIAL;/** Watermarks for order-0 must be met for compaction. Note the 2UL.* This is because during migration, copies of pages need to be* allocated and for a short time, the footprint is higher*//*(2) 接下來以order為0來判斷zone是否在低水位WMARK_LOW + (2 << order)之上，如果達不到這個條件，說明zone中只有很少的空閑頁面，不適合做內存規整，返回COMPACT_SKIPPED表示跳過這個zone*/watermark += (2UL << order);if (!zone_watermark_ok(zone, 0, watermark, classzone_idx, alloc_flags))return COMPACT_SKIPPED;/*(3) 其余情況返回COMPACT_CONTINUE表示zone可以做內存規整。*//** fragmentation index determines if allocation failures are due to* low memory or external fragmentation** index of -1000 would imply allocations might succeed depending on* watermarks, but we already failed the high-order watermark check* index towards 0 implies failure is due to lack of memory* index towards 1000 implies failure is due to fragmentation** Only compact if a failure would be due to fragmentation.*/fragindex = fragmentation_index(zone, order);if (fragindex >= 0 && fragindex <= sysctl_extfrag_threshold)return COMPACT_NOT_SUITABLE_ZONE;return COMPACT_CONTINUE; } 回到compact_zone()函數

compact_finished()函數實現:

static int compact_finished(struct zone *zone, struct compact_control *cc,const int migratetype) {int ret;ret = __compact_finished(zone, cc, migratetype);trace_mm_compaction_finished(zone, cc->order, ret);if (ret == COMPACT_NO_SUITABLE_PAGE)ret = COMPACT_CONTINUE;return ret; }static int __compact_finished(struct zone *zone, struct compact_control *cc,const int migratetype) {unsigned int order;unsigned long watermark;if (cc->contended || fatal_signal_pending(current))return COMPACT_PARTIAL;/* Compaction run completes if the migrate and free scanner meet *//*結束條件有兩個:(1) cc->migrate_pfn和cc->free_pfn兩個指針相遇，他們從zone的一頭一尾向中間方向運行(2) 以order為條件判斷當前zone的水位在低水位WMARK_LOW之上。如果當zone在低水位WMARK_LOW之上，那么需要判斷伙伴系統中的order對應的zone中的可移動類型的空閑鏈表是否為空(zone->free_area[order].free_list[MIGRATE_MOVABLE]),最好的結果是order對應的free_area鏈表正好有空閑頁面，或者大于order的空閑鏈表里有空閑頁面，再或者大于pageblock_order的空閑鏈表有空閑頁面。*/if (cc->free_pfn <= cc->migrate_pfn) {/* Let the next compaction start anew. */zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;zone->compact_cached_free_pfn = zone_end_pfn(zone);/** Mark that the PG_migrate_skip information should be cleared* by kswapd when it goes to sleep. kswapd does not set the* flag itself as the decision to be clear should be directly* based on an allocation request.*/if (!current_is_kswapd())zone->compact_blockskip_flush = true;return COMPACT_COMPLETE;}/** order == -1 is expected when compacting via* /proc/sys/vm/compact_memory*/if (cc->order == -1)return COMPACT_CONTINUE;/* Compaction run is not finished if the watermark is not met */watermark = low_wmark_pages(zone);if (!zone_watermark_ok(zone, cc->order, watermark, cc->classzone_idx,cc->alloc_flags))return COMPACT_CONTINUE;/* Direct compactor: Is a suitable page free? */for (order = cc->order; order < MAX_ORDER; order++) {struct free_area *area = &zone->free_area[order];/* Job done if page is free of the right migratetype */if (!list_empty(&area->free_list[migratetype]))return COMPACT_PARTIAL;/* Job done if allocation would set block type */if (order >= pageblock_order && area->nr_free)return COMPACT_PARTIAL;}return COMPACT_NO_SUITABLE_PAGE; } 回到compact_zone()函數

isolate_migratepages()函數實現：

用于掃描和查找合適遷移的頁面，從zone的頭部開始找起，查找的步長以pageblock_nr_pages為單位。linux內核以pageblock為單位來管理頁的遷移屬性。頁的遷移屬性包括MIGRATE_UNMOVABLE、MIGRATE_RECLAIMABLE、MIGRATE_MOVABLE、MIGRATE_PCPTYPES和MIGRATE_CMA等，內核有兩個函數來管理遷移類型，分別是get_pageblock_migratetype()和set_pageblock_migratetype()。內核在初始化時，所有的頁面最初都標記位MIGRATE_MOVABLE，見memmap_init_zone()函數(mm/page_alloc.c)。pageblock_nr_pages通常是1024個頁面(1UL << MAX_ORDER-1)。

[alloc_pages()->alloc_pages_node()->__alloc_pages()->__alloc_pages_nodemask()->__alloc_pages_slowpath()->

__alloc_pages_direct_compact()->try_to_compact_pages()->compact_zone_order()->compact_zone()->isolate_migratepages()]

** Isolate all pages that can be migrated from the first suitable block,* starting at the block pointed to by the migrate scanner pfn within* compact_control.*/ static isolate_migrate_t isolate_migratepages(struct zone *zone,struct compact_control *cc) {unsigned long low_pfn, end_pfn;struct page *page;/*確定分離類型，通常isolate_mode為ISOLATE_ASYNC_MIGRATE*/const isolate_mode_t isolate_mode =(cc->mode == MIGRATE_ASYNC ? ISOLATE_ASYNC_MIGRATE : 0);/** Start at where we last stopped, or beginning of the zone as* initialized by compact_zone()*/low_pfn = cc->migrate_pfn;/* Only scan within a pageblock boundary */end_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages);/** Iterate over whole pageblocks until we find the first suitable.* Do not cross the free scanner.*//*從zone頭部cc->migrate_pfn開始以pageblock_br_pages為單位向zone尾部方向掃描。*/for (; end_pfn <= cc->free_pfn;low_pfn = end_pfn, end_pfn += pageblock_nr_pages) {/** This can potentially iterate a massively long zone with* many pageblocks unsuitable, so periodically check if we* need to schedule, or even abort async compaction.*/if (!(low_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages))&& compact_should_abort(cc))break;page = pageblock_pfn_to_page(low_pfn, end_pfn, zone);if (!page)continue;/* If isolation recently failed, do not retry */if (!isolation_suitable(cc, page))continue;/** For async compaction, also only scan in MOVABLE blocks.* Async compaction is optimistic to see if the minimum amount* of work satisfies the allocation.*//*判斷pageblock是否為MIGRATE_MOVABLE或MIGRATE_CMA類型，因為這兩種類型的頁是可以遷移的。cc->mode遷移的類型在__alloc_pages_slowpath()函數傳遞下來的參數，通常migration_mode參數是異步的，即MIGRATE_ASYNC*/if (cc->mode == MIGRATE_ASYNC &&!migrate_async_suitable(get_pageblock_migratetype(page)))continue;/* Perform the isolation *//*掃描和分離pageblock中的頁面是否適合遷移，下面查看此函數的實現*/low_pfn = isolate_migratepages_block(cc, low_pfn, end_pfn,isolate_mode);if (!low_pfn || cc->contended) {acct_isolated(zone, cc);return ISOLATE_ABORT;}/** Either we isolated something and proceed with migration. Or* we failed and compact_zone should decide if we should* continue or not.*/break;}acct_isolated(zone, cc);/** Record where migration scanner will be restarted. If we end up in* the same pageblock as the free scanner, make the scanners fully* meet so that compact_finished() terminates compaction.*/cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE; } 回到compact_zone()函數

isolate_migratepages_block()函數實現:

[alloc_pages()->alloc_pages_node()->__alloc_pages()->__alloc_pages_nodemask()->__alloc_pages_slowpath()->

__alloc_pages_direct_compact()->try_to_compact_pages()->compact_zone_order()->compact_zone()->

isolate_migratepages()->isolate_migratepages_block()]

*** isolate_migratepages_block() - isolate all migrate-able pages within* a single pageblock* @cc: Compaction control structure.* @low_pfn: The first PFN to isolate* @end_pfn: The one-past-the-last PFN to isolate, within same pageblock* @isolate_mode: Isolation mode to be used.** Isolate all pages that can be migrated from the range specified by* [low_pfn, end_pfn). The range is expected to be within same pageblock.* Returns zero if there is a fatal signal pending, otherwise PFN of the* first page that was not scanned (which may be both less, equal to or more* than end_pfn).** The pages are isolated on cc->migratepages list (not required to be empty),* and cc->nr_migratepages is updated accordingly. The cc->migrate_pfn field* is neither read nor updated.*/ static unsigned long isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,unsigned long end_pfn, isolate_mode_t isolate_mode) {struct zone *zone = cc->zone;unsigned long nr_scanned = 0, nr_isolated = 0;struct list_head *migratelist = &cc->migratepages;struct lruvec *lruvec;unsigned long flags = 0;bool locked = false;struct page *page = NULL, *valid_page = NULL;unsigned long start_pfn = low_pfn;/** Ensure that there are not too many pages isolated from the LRU* list by either parallel reclaimers or compaction. If there are,* delay for some time until fewer pages are isolated*//*too_many_isolated()函數判斷當前臨時從LRU鏈表分離出來的頁面比較多，則最好睡眠等待100毫秒(congestion_wait)，如果遷移模式是異步(MIGRATE_ASYNC)的，則直接退出。*/while (unlikely(too_many_isolated(zone))) {/* async migration should just abort */if (cc->mode == MIGRATE_ASYNC)return 0;congestion_wait(BLK_RW_ASYNC, HZ/10);if (fatal_signal_pending(current))return 0;}if (compact_should_abort(cc))return 0;/* Time to isolate some pages for migration *//*此循環中掃描pageblock尋找可以遷移的頁*/for (; low_pfn < end_pfn; low_pfn++) {/** Periodically drop the lock (if held) regardless of its* contention, to give chance to IRQs. Abort async compaction* if contended.*/if (!(low_pfn % SWAP_CLUSTER_MAX)&& compact_unlock_should_abort(&zone->lru_lock, flags,&locked, cc))break;if (!pfn_valid_within(low_pfn))continue;nr_scanned++;page = pfn_to_page(low_pfn);if (!valid_page)valid_page = page;/** Skip if free. We read page order here without zone lock* which is generally unsafe, but the race window is small and* the worst thing that can happen is that we skip some* potential isolation targets.*//*如果該頁還在伙伴系統中，那么該頁不適合遷移，略過該頁。通過page_order_unsafe()讀取該頁的order值，for循環可以直接略過這些頁*/if (PageBuddy(page)) {unsigned long freepage_order = page_order_unsafe(page);/** Without lock, we cannot be sure that what we got is* a valid page order. Consider only values in the* valid order range to prevent low_pfn overflow.*/if (freepage_order > 0 && freepage_order < MAX_ORDER)low_pfn += (1UL << freepage_order) - 1;continue;}/** Check may be lockless but that's ok as we recheck later.* It's possible to migrate LRU pages and balloon pages* Skip any other type of page*//*在LRU鏈表中的頁面或balloon頁面適合遷移，其他類型的頁面將被略過。*/if (!PageLRU(page)) {if (unlikely(balloon_page_movable(page))) {if (balloon_page_isolate(page)) {/* Successfully isolated */goto isolate_success;}}continue;}/** PageLRU is set. lru_lock normally excludes isolation* splitting and collapsing (collapsing has already happened* if PageLRU is set) but the lock is not necessarily taken* here and it is wasteful to take it just to check transhuge.* Check TransHuge without lock and skip the whole pageblock if* it's either a transhuge or hugetlbfs page, as calling* compound_order() without preventing THP from splitting the* page underneath us may return surprising results.*/if (PageTransHuge(page)) {if (!locked)low_pfn = ALIGN(low_pfn + 1,pageblock_nr_pages) - 1;elselow_pfn += (1 << compound_order(page)) - 1;continue;}/** Migration will fail if an anonymous page is pinned in memory,* so avoid taking lru_lock and isolating it unnecessarily in an* admittedly racy check.*//*之前已經排除了PageBuddy和頁不在LRU鏈表的情況，接下來剩下的頁面是比較合適的候選者，但是還有一些特殊情況需要過濾掉。page_mapping()返回0，說明有可能是匿名頁面。對于匿名頁面來說，通常情況下page_count(page) = page_mapcount(page)，即page->_count = page->_mapcount + 1.如果不相等，說明內核有人偷偷使用了這個匿名頁面，所以匿名頁面也不適合遷移。*/if (!page_mapping(page) &&page_count(page) > page_mapcount(page))continue;/* If we already hold the lock, we can skip some rechecking *//*加鎖zone->lru_lock，并且重新判斷該頁是否是LRU鏈表中的頁*/if (!locked) {locked = compact_trylock_irqsave(&zone->lru_lock,&flags, cc);if (!locked)break;/* Recheck PageLRU and PageTransHuge under lock */if (!PageLRU(page))continue;if (PageTransHuge(page)) {low_pfn += (1 << compound_order(page)) - 1;continue;}}lruvec = mem_cgroup_page_lruvec(page, zone);/* Try isolate the page *//*__isolate_lru_page()分離ISOLATE_ASYNC_MIGRATE類型的頁面。__isolate_lru_page()函數之前分析過，對于正在回寫的頁面是不合格的候選者，對于臟的頁面，如果該頁沒有定義mapping->a_ops->migratepage()函數指針，那么也是不合格的候選者，另外還會對該頁的page->_count引用計數加1，并清PG_lru標志位*/if (__isolate_lru_page(page, isolate_mode) != 0)continue;VM_BUG_ON_PAGE(PageTransCompound(page), page);/* Successfully isolated *//*把該頁從LRU鏈表中刪除*/del_page_from_lru_list(page, lruvec, page_lru(page));/*表示該頁是一個合格的、可以遷移的頁面，添加到cc->migratelist鏈表中*/ isolate_success:list_add(&page->lru, migratelist);cc->nr_migratepages++;nr_isolated++;/* Avoid isolating too much */if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) {++low_pfn;break;}}/*適合被內存規整遷移的頁面總結如下：(1) 必須在LRU鏈表中的頁面，還在伙伴系統中的頁面不適合。(2) 正在回寫中的頁面不適合，即標記為PG_writeback的頁面。(3) 標記為PG_unevictable的頁面不適合。(4) 沒有定義mapping->a_ops->migratepage()方法的臟頁面不適合。*//** The PageBuddy() check could have potentially brought us outside* the range to be scanned.*/if (unlikely(low_pfn > end_pfn))low_pfn = end_pfn;if (locked)spin_unlock_irqrestore(&zone->lru_lock, flags);/** Update the pageblock-skip information and cached scanner pfn,* if the whole pageblock was scanned without isolating any page.*/if (low_pfn == end_pfn)update_pageblock_skip(cc, valid_page, nr_isolated, true);trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,nr_scanned, nr_isolated);count_compact_events(COMPACTMIGRATE_SCANNED, nr_scanned);if (nr_isolated)count_compact_events(COMPACTISOLATED, nr_isolated);return low_pfn; } 回到compact_zone()函數

migrate_pages()函數實現：遷移頁的核心函數，從cc->migratepages鏈表中摘取頁，然后嘗試去遷移頁。compaction_alloc()從zone的末尾開始查找空閑頁面，然后并把空閑頁面添加到cc->freepages鏈表中。

? ? migrate_pages()函數在頁遷移一節中已經介紹，其中get_new_page()函數指針指向compaction_alloc()函數，put_new_page()函數指針指向compaction_free()函數，遷移模式為MIGRATE_ASYNC,reasion為MR_COMPACTION.

/** This is a migrate-callback that "allocates" freepages by taking pages* from the isolated freelists in the block we are migrating to.*/ /*查找哪些頁面適合遷移，compaction_alloc()函數是從zone尾部開始查找哪些頁面是空閑頁面，核心函數是isolate_freepages()函數，它與之前的isolate_migratepages()函數很相似。 compaction_alloc()函數最后返回一個空閑的頁面。*/ static struct page *compaction_alloc(struct page *migratepage,unsigned long data,int **result) {struct compact_control *cc = (struct compact_control *)data;struct page *freepage;/** Isolate free pages if necessary, and if we are not aborting due to* contention.*/if (list_empty(&cc->freepages)) {if (!cc->contended)isolate_freepages(cc);if (list_empty(&cc->freepages))return NULL;}freepage = list_entry(cc->freepages.next, struct page, lru);list_del(&freepage->lru);cc->nr_freepages--;return freepage; }

總結

以上是生活随笔為你收集整理的17 内存规整(memory compaction)的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： matlab中删除矩阵中的某些行
下一篇： SEO快排的行业秘密，原来SEO快排套路