外設數據到昇騰310推理卡之二dma_alloc

外設數據到昇騰310推理卡之二dma_alloc_attrs

內核源碼及路徑

CONFIG_DMA_DECLARE_COHERENT

DTS示例配置

dma_direct_alloc

特殊屬性快速路徑 (DMA_ATTR_NO_KERNEL_MAPPING)

主體流程

1. 內存分配核心

2. 地址轉換

3. 緩存一致性處理

?映射

attrs不同屬性的cache處理

?cache的標示（ARM64）

dma_alloc_attrs

總結

? ?前述文章中，我們介紹了分配內存的幾種方式以及著重介紹mmap方式的一個參數，本文介紹分配內存的過程。

內核源碼及路徑

路徑	函數及宏? 功能
kernel\dma\mapping.c	dma_alloc_attrs
kernel\kernel\dma\coherent.c	dma_declare_coherent_memory dma_alloc_from_dev_coherent? DMA設備一致性內存分配
\kernel\kernel\dma\direct.c	dma_direct_alloc? DMA CMA內存分配
\kernel\include\linux\dma-mapping.h	dma_alloc_coherent
? arch\arm64\mm? ? ?	arch_dma_prep_coherent 分配內存頁后，將內存頁轉換虛擬地址并調用__dma_flush_area
\kernel\arch\arm64\mm\cache.S	__dma_flush_area 功能clean & invalidate D / U line
I:\rk3588\kernel\arch\arm64\include\asm\pgtable.h	pgprot_syscached 功能Mark the prot value as outer cacheable and inner non-cacheable

CONFIG_DMA_DECLARE_COHERENT

聲明設備默認支持硬件一致性DMA（Hardware-Coherent DMA），使得內核在分配DMA緩沖區時，自動假設設備與CPU緩存保持一致，無需軟件維護同步。

場景	啟用?`CONFIG_DMA_DECLARE_COHERENT`	不啟用
內存分配	`dma_alloc_coherent()`?返回硬件一致性內存	默認返回非一致性內存（需手動同步）
同步操作	無需調用?`dma_sync_*`	必須顯式同步緩存
設備樹/ACPI配置	需設備節點包含?`dma-coherent`?屬性	無特殊要求
性能	更高（無同步開銷）	較低（依賴軟件同步

如何驗證硬件是否真正支持一致性？

檢查設備手冊是否聲明支持（如ARM的ACP或PCIe的ATS）。在驅動中故意省略dma_sync_*，測試數據傳輸是否正常。

DTS示例配置

reserved-memory {#address-cells = <1>;#size-cells = <1>;ranges;my_coherent_pool: coherent_pool@0x10000000 {compatible = "shared-dma-pool";reg = <0x10000000 0x400000>; // 4MB區域no-map;};
};my_device: my_device@0 {compatible = "vendor,coherent-device";memory-region = <&my_coherent_pool>; // 關聯內存區域dma-coherent;
};

基于上述的DTS與內核配置，在分配一致性內存時，從上述DTS中的區域分配，而非從cma分配。例如我們可以將高地址內存預留出來，通過上述方式給我們的視頻接口使用。

此外可以在驅動中調用接口?dma_declare_coherent_memory將保留的物理內存與設備關聯。進而繞開cma

dma_direct_alloc

特殊屬性快速路徑 (`DMA_ATTR_NO_KERNEL_MAPPING`)

if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && !force_dma_unencrypted(dev)) {page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));return page; // 返回 page 結構而非虛擬地址
}

應用場景：當內核不需要訪問該內存時（如純設備間DMA）

主體流程

1. 內存分配核心

page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);

使用 CMA 或 buddy 分配器獲取物理連續頁
明確排除?__GFP_ZERO?以優化性能（后續手動清零）

2. 地址轉換

*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));

將物理地址轉換為設備可識別的 DMA 地址
處理可能的地址偏移（如 SMMU 前向窗口）

3. 緩存一致性處理

arch_dma_prep_coherent(page, size);

確保 CPU 緩存與內存一致。這里是在分配內存后，被使用前，進行的一致性處理。架構特定實現（如 ARM 的 cache 刷寫）

?映射

主要屬性控制，如上一篇所述

attrs不同屬性的cache處理

/** Return the page attributes used for mapping dma_alloc_* memory, either in* kernel space if remapping is needed, or to userspace through dma_mmap_*.*/
pgprot_t dma_pgprot(struct device *dev, pgprot_t prot, unsigned long attrs)
{if (force_dma_unencrypted(dev))prot = pgprot_decrypted(prot);if (dev_is_dma_coherent(dev))return prot;
#ifdef CONFIG_ARCH_HAS_DMA_WRITE_COMBINEif (attrs & DMA_ATTR_WRITE_COMBINE)return pgprot_writecombine(prot);
#endifif (attrs & DMA_ATTR_SYS_CACHE_ONLY ||attrs & DMA_ATTR_SYS_CACHE_ONLY_NWA)return pgprot_syscached(prot);return pgprot_dmacoherent(prot);
}#define pgprot_dmacoherent(prot)	pgprot_noncached(prot)  //關閉cache

?cache的標示（ARM64）

/*
?* Mark the prot value as outer cacheable and inner non-cacheable. Non-coherent
?* devices on a system with support for a system or last level cache use these
?* attributes to cache allocations in the system cache.
?*/


#define pgprot_syscached(prot) \__pgprot_modify(prot, PTE_ATTRINDX_MASK, \PTE_ATTRINDX(MT_NORMAL_iNC_oWB) | PTE_PXN | PTE_UXN)

dma_alloc_attrs

? ? ?內核分配并映射內存的流程分為三個主體，本篇介紹了左邊兩個。至于smmu iommu的映射目前不涉及。?

? ? ?順便提一句?dma_alloc_coherent，因為很多教程里面都提這個接口，這個接口內部實際封裝了attrs這個接口，而僅僅把attrs屬性設置了0，根據上述的分析，即映射到內核時，其頁面被設置了no cache的屬性，即關閉了cache。

static inline void *dma_alloc_coherent(struct device *dev, size_t size,dma_addr_t *dma_handle, gfp_t gfp)
{return dma_alloc_attrs(dev, size, dma_handle, gfp,(gfp & __GFP_NOWARN) ? DMA_ATTR_NO_WARN : 0);
}

總結

? ? 分析了dma 分配的流程，分為三種情況。

? ? 1） dma 設備支持一致性，從設備占用的DDR中分配內存，不走cma的接口。

? ? ?2）直接進行物理地址的映射。

? ? ?3） iommu的情況。??dma_map_ops的實現

? ? 通過上述流程可知，dma分配的接口可以分配cache的和no cached，具體哪些驅動采用了cached呢？

? ? ?另外既然如此，流式映射的接口，諸如dma_map_single為什么不用dma_alloc_attrs 封裝呢？

因為dma_map_single為單純建立映射，并不分配內存；而dma_alloc_attrs先分配了頁啊。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/pingmian/88507.shtml
繁體地址，請注明出處：http://hk.pswp.cn/pingmian/88507.shtml
英文地址，請注明出處：http://en.pswp.cn/pingmian/88507.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！