當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

[mmu/cache]-ARM cache的学习笔记-一篇就够了

發(fā)布時(shí)間：2025/3/21 编程问答 24 豆豆

生活随笔收集整理的這篇文章主要介紹了 [mmu/cache]-ARM cache的学习笔记-一篇就够了小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

★★★ 個(gè)人博客導(dǎo)讀首頁—點(diǎn)擊此處 ★★★
.
說明：
在默認(rèn)情況下，本文講述的都是ARMV8-aarch64架構(gòu)，linux kernel 64位
.
相關(guān)文章
1、ARM MMU的學(xué)習(xí)筆記-一篇就夠了

自制《armv8的VMSA/MMU/Cache介紹》學(xué)習(xí)視頻:

文章目錄

- 應(yīng)用場(chǎng)景——什么時(shí)候需要刷cache
- - 1、在不同硬件之間共享數(shù)據(jù)時(shí)
  - 2、在不同系統(tǒng)之間共享數(shù)據(jù)時(shí)（如linux / optee）
- cache原理介紹
- - 1、ARM cache的硬件框圖
  - 2、ARM cache層級(jí)關(guān)系的介紹
  - 3、ARMv8的多級(jí)cache訪問內(nèi)存的框圖
  - 4、ARM Cache的一些術(shù)語介紹
  - 5、ARM cache緩存的連接方式
  - - (1)、直接映射緩存
    - (2)、兩路組相連緩存
    - (3)、全相連緩存
  - 6、ARM cache的查詢過程舉例
  - 7、ARM cache的一些屬性和概念
  - - (1)、cache的種類
    - (2)、讀分配(read allocation) 和寫分配(write allocation)
    - (3)、寫直通(write through) 和寫回(write back)
    - (4)、inner and outer
    - (5)、PoC 和 PoU
  - 8、ARM Cortex-A76的Cache簡(jiǎn)介
  - 9、armv8的cache寄存器小結(jié)
  - - (1)、常用的寄存器
    - (2)、armv8.5 memory-tag相關(guān)的cache寄存器

應(yīng)用場(chǎng)景——什么時(shí)候需要刷cache

1、在不同硬件之間共享數(shù)據(jù)時(shí)

場(chǎng)景：CPU往src地址處寫入了一串?dāng)?shù)據(jù)，然后交給Crypto硬件進(jìn)行加解密處理，加解密后的數(shù)據(jù)放在了dst地址處，然后cpu讀取dst地址處的數(shù)據(jù)獲取結(jié)果。

(1)、cpu在往內(nèi)存(src地址)寫數(shù)據(jù)時(shí),cache中會(huì)緩存這些數(shù)據(jù),并沒有立即同步到DDR, 只有該地址在cache中被換出去時(shí)候，才會(huì)同步到ddr

(2)、Device和ddr直接無cache，device直接從ddr(src地址)中讀取數(shù)據(jù), 此時(shí)當(dāng)然讀不到前面cpu寫入的數(shù)據(jù).
解決辦法, 在device讀取ddr數(shù)據(jù)之前, 先做__flush_dcache_area, 將cache數(shù)據(jù)刷到到內(nèi)存

(3)、Device和ddr直接無cache，device將數(shù)據(jù)直接寫入到ddr(dst地址)

(4)、cpu再次讀取該地址(dst)數(shù)據(jù)時(shí)，發(fā)現(xiàn)cache中已經(jīng)緩存了該地址數(shù)據(jù)，就會(huì)直接從cache中拿，就不會(huì)去ddr中拿了. 就拿不到device寫入到ddr中的數(shù)據(jù)了.
解決辦法:device寫入數(shù)據(jù)到DDR后，調(diào)用__invalid_dcache_area, 讓cache中緩存的數(shù)據(jù)無效，這樣cpu再次讀取的時(shí)候，發(fā)現(xiàn)cache中的緩存無效，就會(huì)從DDR中讀取

2、在不同系統(tǒng)之間共享數(shù)據(jù)時(shí)（如linux / optee）

如果是VIVT的cache（virtual index virtual tag），在linux kernel中有一份地址空間，在optee中也有一份地址空間.
linux kernel和optee通過share memory進(jìn)行通信. 對(duì)于同一塊物理地址，在linux kernel和optee中映射的虛擬地址是不同的, 所以對(duì)于該物理地址的數(shù)據(jù)，在linux kernel和optee中，緩存到了不同的cache中:

當(dāng)linux往該區(qū)域?qū)懭霐?shù)據(jù)時(shí)，并沒有直接寫入到物理內(nèi)存，而是寫到了cache中。只有cache中的數(shù)據(jù)將被換出去時(shí)，cache中的數(shù)據(jù)才會(huì)被真正寫入到內(nèi)存；

而TEE去該區(qū)域讀取數(shù)據(jù)，該區(qū)域如果在TEE中miss了，那么TEE會(huì)到物理內(nèi)存中讀取，此時(shí)拿到的不是有效數(shù)據(jù)。
而如果在TEE中hit，此時(shí)會(huì)到Cache中讀取數(shù)據(jù)，該cache并不是linux緩存共享物理buf的那段cache，顯然也拿不到有效數(shù)據(jù)。

解決方案: linux中寫入數(shù)據(jù)后，請(qǐng)flush_dcache，在optee讀取數(shù)據(jù)之前，請(qǐng)先invalid_dcache。

cache原理介紹

1、ARM cache的硬件框圖

2、ARM cache層級(jí)關(guān)系的介紹

在ARM architecture的設(shè)計(jì)中,cache有三級(jí): L1、L2、L3.
L1 cache是每個(gè)ARM core私有的，L1 Cache又分為i-cache、d-cache，
L2 cache是在每個(gè)cluster中所有Arm core共享的，不區(qū)分icache和dcache.
L3 cache是所有所有cluster共享的.

以A76核為例:
（1）、L1 d-cache 和 L1 d-cache都是64KB，4路256組相連，每個(gè)cache line是64bytes. 這個(gè)配置由ARM Core決定的，是SOC中無法修改
（2）、L2 cache是8路相連的cache，大小可選:128KB、256KB、512KB. 這是ASIC設(shè)計(jì)時(shí)需要支持其中的一種size，這是可配置選擇的
（3）、L3 cache是由SOC側(cè)設(shè)計(jì)，大小由SOC廠商的ASIC決定

3、ARMv8的多級(jí)cache訪問內(nèi)存的框圖

在armv8中，PIPT的cache中，當(dāng)cpu需要訪問一個(gè)虛擬地址、首先需要經(jīng)過MMU轉(zhuǎn)換成物理地址，然后再到L1 L2 L3中查詢相應(yīng)數(shù)據(jù)

4、ARM Cache的一些術(shù)語介紹

什么是set/way/line/index/tag/offset ？

cache line / entry / set / way的概念

5、ARM cache緩存的連接方式

直接映射緩存(Direct mapped cache)
兩路組相連緩存(Two-way set associative cache)
全相連緩存(Full associative cache)

(1)、直接映射緩存

優(yōu)點(diǎn):硬件設(shè)計(jì)上會(huì)更加簡(jiǎn)單、成本也較低
缺點(diǎn): 容易出現(xiàn)cache顛簸（cache thrashing）影響性能

(2)、兩路組相連緩存

優(yōu)點(diǎn):減少cache顛簸出現(xiàn)頻率
缺點(diǎn): 增加硬件設(shè)計(jì)復(fù)雜讀、成本較高(需要比較多個(gè)cache line的TAG)

(3)、全相連緩存

優(yōu)點(diǎn):最大程度的降低cache顛簸的頻率
缺點(diǎn): 增加硬件設(shè)計(jì)復(fù)雜讀、成本較高(需要比較多個(gè)cache line的TAG)
這種cache的地址中，無需index了.

6、ARM cache的查詢過程舉例

舉例：L1-dcache ：一個(gè)大小64KB的cache，4路256組相連，cache line為64bytes

在L1-dcache中的查詢過程： cpu發(fā)起一個(gè)虛擬地址，經(jīng)過MMU轉(zhuǎn)換為物理地址，根據(jù)index去查找cache line（因?yàn)槭撬穆废噙B的cache，所以可以查詢到4個(gè)cache line），然后對(duì)比TAG（先看invalid位，再對(duì)比TAG值），然后再根據(jù)offset找到具體的bytes取出數(shù)據(jù)

注意，再A76中，L1-dcache是“一個(gè)大小64KB的cache，4路256組相連，cache line為64bytes”的cache； L2-cache是 “8路相連的cache，大小為128KB”. 如果L1-dcache查詢miss了，則會(huì)繼續(xù)再查詢L2-cache

7、ARM cache的一些屬性和概念

(1)、cache的種類

PIPT : Physically Indexed, Physically Tagged
VIVT : Virtually Indexed, Virtually Tagged
VIPT : Virtually Indexed, Physically Tagged ---- 在armv8的芯片中，一般都是這種

(2)、讀分配(read allocation) 和寫分配(write allocation)

Cache分配策略(Cache allocation policy)有讀分配(read allocation) 和寫分配(write allocation)：

? 讀分配(read allocation)

（在ARM的文檔中, read allocation也叫No-write allocate）
當(dāng)CPU讀數(shù)據(jù)時(shí)，發(fā)生cache缺失，這種情況下都會(huì)分配一個(gè)cache line緩存從主存讀取的數(shù)據(jù)。默認(rèn)情況下，cache都支持讀分配。

? 寫分配(write allocation)
當(dāng)CPU寫數(shù)據(jù)發(fā)生cache缺失時(shí)，才會(huì)考慮寫分配策略。當(dāng)我們不支持寫分配的情況下，寫指令只會(huì)更新主存數(shù)據(jù)，然后就結(jié)束了。當(dāng)支持寫分配的時(shí)候，我們首先從主存中加載數(shù)據(jù)到cache line中（相當(dāng)于先做個(gè)讀分配動(dòng)作），然后會(huì)更新cache line中的數(shù)據(jù)。

(3)、寫直通(write through) 和寫回(write back)

Cache更新策略(Cache update policy)有寫直通(write through) 和寫回(write back)：
? 寫直通(write through)
當(dāng)CPU執(zhí)行store指令并在cache命中時(shí)，我們更新cache中的數(shù)據(jù)并且更新主存中的數(shù)據(jù)。cache和主存的數(shù)據(jù)始終保持一致

? 寫回(write back)
當(dāng)CPU執(zhí)行store指令并在cache命中時(shí)，我們只更新cache中的數(shù)據(jù)。每個(gè)cache line中會(huì)有一個(gè)bit位記錄數(shù)據(jù)是否被修改過，稱之為dirty bit。主存中的數(shù)據(jù)可能是未修改的數(shù)據(jù)，而修改的數(shù)據(jù)躺在cache中。cache和主存的數(shù)據(jù)可能不一致

(4)、inner and outer

(5)、PoC 和 PoU

Poc是值對(duì)于不同的Master看到的一致性的內(nèi)存; 例如對(duì)于cores,DSP,DMA他們一致性的內(nèi)存就是main memory，所以main memory是PoC這個(gè)點(diǎn)

PoU是值指令和數(shù)據(jù)cache上一致的那個(gè)點(diǎn)，一般為L(zhǎng)2 cache，如果系統(tǒng)中沒有L2 cache，那么PoU為main memory

8、ARM Cortex-A76的Cache簡(jiǎn)介

L1 i-cache ：64KB，4路256組相連，cache line位64bytes
L1 d-cache ：64KB，4路256組相連，cache line位64bytes
.
L2 cache ：8路相連的cache，大小可選-128KB, 256KB, or 512KB
.
L1 TLB i-cache ：全相連(fully associative)，48 entries , 支持4KB, 16KB, 64KB, 2MB，32M page-size
L1 TLB d-cache ：全相連(fully associative)，48 entries ,支持4KB, 16KB, 64KB, 2MB，512MB page-size
.
L2 TLB cache : 5 ways, 1280 entries

9、armv8的cache寄存器小結(jié)

(1)、常用的寄存器

DC CISW, Data or unified Cache line Clean and Invalidate by Set/Way (Clean Invalidate Set Way) DC CISW, <Xt>DC CSW, Data or unified Cache line Clean by Set/Way DC CSW, <Xt>DC CVAU, Data or unified Cache line Clean by VA to PoU DC CVAU, <Xt>DC ZVA, Data Cache Zero by VA DC ZVA, <Xt>IC IALLU, Instruction Cache Invalidate All to PoU IC IALLU{, <Xt>} IC IALLUIS, Instruction Cache Invalidate All to PoU, Inner Shareable IC IALLUIS{, <Xt>} IC IVAU, Instruction Cache line Invalidate by VA to PoU IC IVAU{, <Xt>}DC CIVAC, Data or unified Cache line Clean and Invalidate by VA to PoC DC CVAC, Data or unified Cache line Clean by VA to PoC DC CVAP, Data or unified Cache line Clean by VA to PoP DC GVA, Data Cache set Allocation Tag by VA DC GZVA, Data Cache set Allocation Tags and Zero by VA DC IGDSW, Data, Allocation Tag or unified Cache line Invalidate of Data and Allocation Tags by Set/Way DC IGDVAC, Data, Allocation Tag or unified Cache line Invalidate of Allocation Tags by VA to PoC DC IGSW, Data, Allocation Tag or unified Cache line Invalidate of Allocation Tags by Set/Way DC IGVAC, Data, Allocation Tag or unified Cache line Invalidate of Allocation Tags by VA to PoC DC ISW, Data or unified Cache line Invalidate by Set/Way DC IVAC, Data or unified Cache line Invalidate by VA to PoC DC CVADP, Data or unified Cache line Clean by VA to PoDP

舉一個(gè)指令(寄存器)使用的例子：根據(jù)set/way clean和invalid data cache

(2)、armv8.5 memory-tag相關(guān)的cache寄存器

ARMv8.5-MemTag: DC CGDSW, Data, Allocation Tag or unified Cache line Clean of Data and Allocation Tags by Set/Way DC CGDVAC, Data, Allocation Tag or unified Cache line Clean of Allocation Tags by VA to PoC DC CGDVADP, Data, Allocation Tag or unified Cache line Clean of Allocation Tags by VA to PoDP DC CGDVAP, Data, Allocation Tag or unified Cache line Clean of Data and Allocation Tags by VA to PoP DC CGSW, Data, Allocation Tag or unified Cache line Clean of Allocation Tags by Set/Way DC CGVAC, Data, Allocation Tag or unified Cache line Clean of Allocation Tags by VA to PoC DC CGVADP, Data, Allocation Tag or unified Cache line Clean of Data and Allocation Tags by VA to PoDP DC CGVAP, Data, Allocation Tag or unified Cache line Clean of Allocation Tags by VA to PoP DC CIGDSW, Data, Allocation Tag or unified Cache line Clean and Invalidate of Data and Allocation Tags by Set/Way DC CIGDVAC, Data, Allocation Tag or unified Cache line Clean and Invalidate of Data and Allocation Tags by VA to PoC DC CIGSW, Data, Allocation Tag or unified Cache line Clean and Invalidate of Allocation Tags by Set/Way DC CIGVAC, Data, Allocation Tag or unified Cache line Clean and Invalidate of Allocation Tags by VA to PoC

歡迎添加微信、微信群，多多交流

總結(jié)

以上是生活随笔為你收集整理的[mmu/cache]-ARM cache的学习笔记-一篇就够了的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： [mmu/cache]-ARMV8 MM
下一篇： optee内存管理和页表建立