Linux中的冷熱頁機制簡述

2024-09-01 13:48:09

字體：大中小

來源：轉載

供稿：網友

什么是冷熱頁？

在Linux Kernel的物理內存管理的Buddy System中，引入了冷熱頁的概念。冷頁表示該空閑頁已經不再高速緩存中了(一般是指L2 Cache)，熱頁表示該空閑頁仍然在高速緩存中。冷熱頁是針對于每CPU的，每個zone中，都會針對于所有的CPU初始化一個冷熱頁的per-cpu-pageset.

為什么要有冷熱頁？

作用有3點：

Buddy Allocator在分配order為0的空閑頁的時候，如果分配一個熱頁，那么由于該頁已經存在于L2 Cache中了。CPU寫訪問的時候，不需要先把內存中的內容讀到Cache中，然后再寫。如果分配一個冷頁，說明該頁不在L2 Cache中。一般情況下，盡可能用熱頁，是容易理解的。什么時候用冷頁呢？While allocating a physical page frame, there is a bit specifying whether we would like a hot or a cold page (that is, a page likely to be in the CPU cache, or a page not likely to be there). If the page will be used by the CPU, a hot page will be faster. If the page will be used for device DMA the CPU cache would be invalidated anyway, and a cold page does not waste precious cache contents.

簡單翻譯一下：當內核分配一個物理頁框時，有一些規范來約束我們是分配熱頁還是冷頁。當頁框是CPU使用的，則分配熱頁。當頁框是DMA設備使用的，則分配冷頁。因為DMA設備不會用到CPU高速緩存，所以沒必要使用熱頁。
Buddy System在給某個進程分配某個zone中空閑頁的時候，首先需要用自旋鎖鎖住該zone,然后分配頁。這樣，如果多個CPU上的進程同時進行分配頁，便會競爭。引入了per-cpu-set后，當多個CPU上的進程同時分配頁的時候，競爭便不會發生，提高了效率。另外當釋放單個頁面時，空閑頁面首先放回到per-cpu-pageset中，以減少zone中自旋鎖的使用。當頁面緩存中的頁面數量超過閥值時，再將頁面放回到伙伴系統中。

使用每CPU冷熱頁還有一個好處是，能保證某個頁一直黏在1個CPU上，這有助于提高Cache的命中率。

冷熱頁的數據結構

 struct per_cpu_pages {  int count;    // number of pages in the list  int high;    // high watermark, emptying needed  int batch;    // chunk size for buddy add/remove   // Lists of pages, one per migrate type stored on the pcp-lists   每個CPU在每個zone上都有MIGRATE_PCPTYPES個冷熱頁鏈表（根據遷移類型劃分）   struct list_head lists[MIGRATE_PCPTYPES]; };

在Linux中，對于UMA的架構，冷熱頁是在一條鏈表上進行管理。熱頁在前，冷頁在后。CPU每釋放一個order為0的頁，如果per-cpu-pageset中的頁數少于其指定的閾值，便會將釋放的頁插入到冷熱頁鏈表的開始處。這樣，之前插入的熱頁便會隨著其后熱頁源源不斷的插入向后移動，其頁由熱變冷的幾率便大大增加。

怎樣分配冷熱頁

在分配order為0頁的時候(冷熱頁機制只處理單頁分配的情況)，先找到合適的zone,然后根據需要的migratetype類型定位冷熱頁鏈表（每個zone，對于每個cpu,有3條冷熱頁鏈表，對應于：MIGRATE_UNMOVABLE、MIGRATE_RECLAIMABLE、MIGRATE_MOVABLE）。若需要熱頁，則從鏈表頭取下一頁（此頁最“熱”）；若需要冷頁，則從鏈表尾取下一頁（此頁最“冷”）。

分配函數（關鍵部分已添加注釋）：

 /* * Really, prep_compound_page() should be called from __rmqueue_bulk(). But * we cheat by calling it from here, in the order > 0 path. Saves a branch * or two. */static inlinestruct page *buffered_rmqueue(struct zone *preferred_zone,   struct zone *zone, int order, gfp_t gfp_flags,   int migratetype){ unsigned long flags; struct page *page; //分配標志是__GFP_COLD才分配冷頁 int cold = !!(gfp_flags & __GFP_COLD);again: if (likely(order == 0)) {  struct per_cpu_pages *pcp;  struct list_head *list;  local_irq_save(flags);  pcp = &this_cpu_ptr(zone->pageset)->pcp;  list = &pcp->lists[migratetype];  if (list_empty(list)) {   //如果缺少頁，則從Buddy System中分配。   pcp->count += rmqueue_bulk(zone, 0,     pcp->batch, list,     migratetype, cold);   if (unlikely(list_empty(list)))    goto failed;  }  if (cold)  //分配冷頁時，從鏈表尾部分配，list為鏈表頭，list->prev表示鏈表尾   page = list_entry(list->prev, struct page, lru);  else  //分配熱頁時，從鏈表頭分配   page = list_entry(list->next, struct page, lru);  //分配完一個頁框后從冷熱頁鏈表中刪去該頁  list_del(&page->lru);  pcp->count--; } else {//如果order!=0(頁框數>1)，則不從冷熱頁鏈表中分配  if (unlikely(gfp_flags & __GFP_NOFAIL)) {   /*    * __GFP_NOFAIL is not to be used in new code.    *    * All __GFP_NOFAIL callers should be fixed so that they    * properly detect and handle allocation failures.    *    * We most definitely don't want callers attempting to    * allocate greater than order-1 page units with    * __GFP_NOFAIL.    */   WARN_ON_ONCE(order > 1);  }  spin_lock_irqsave(&zone->lock, flags);  page = __rmqueue(zone, order, migratetype);  spin_unlock(&zone->lock);  if (!page)   goto failed;  __mod_zone_page_state(zone, NR_FREE_PAGES, -(1 << order)); } __count_zone_vm_events(PGALLOC, zone, 1 << order); zone_statistics(preferred_zone, zone, gfp_flags); local_irq_restore(flags); VM_BUG_ON(bad_range(zone, page)); if (prep_new_page(page, order, gfp_flags))  goto again; return page;failed: local_irq_restore(flags); return NULL;}

以上就是本文的全部內容，希望對大家的學習有所幫助，也希望大家多多支持VEVB武林網。

上一篇：tomcat 騰訊云主機和微信

下一篇：Windows服務器的基礎安全加固方法(2008、2012)