Go語言Map的底層原理

概念

map 又稱字典，是一種常用的數據結構，核心特征包含下述三點：

（1）存儲基于 key-value 對映射的模式；

（2）基于 key 維度實現存儲數據的去重；

（3）讀、寫、刪操作控制，時間復雜度 O(1).

****key 的類型要求
map 中，key 的數據類型必須為可比較的類型，切片、map、func不可比較

指針類型是可以比較的。

如果是結構體會怎么樣？

結構體中的所有字段的類型都必須是可比較的類型的才能作為map的key

type student struct {name stringage  int
}func TestMap(t *testing.T) {var m map[student]stringm = map[student]string{student{"Jane", 20}: "Jane",}t.Log(m)
}

同理數組也是一樣。

遍歷

在執行 map 遍歷操作時，獲取的 key-value 對并沒有一個固定的順序，因此前后兩次遍歷順序可能存在差異.

并發沖突

map 不是并發安全的數據結構，倘若存在并發讀寫行為，會拋出 fatal error.

具體規則是：

（1）并發讀沒有問題；

（2）并發讀寫中的“寫”是廣義上的，包含寫入、更新、刪除等操作；

（3）讀的時候發現其他 goroutine 在并發寫，拋出 fatal error；

（4）寫的時候發現其他 goroutine 在并發寫，拋出 fatal error.

fatal("concurrent?map?read?and?map?write")
fatal("concurrent?map?writes")

需要關注，此處并發讀寫會引發 fatal error，是一種比 panic 更嚴重的錯誤，無法使用 recover 操作捕獲.

核心原理

map 又稱為 hash map，在算法上基于 hash 實現 key 的映射和尋址；在數據結構上基于桶數組實現 key-value 對的存儲.

以一組 key-value 對寫入 map 的流程為例進行簡述：

（1）通過哈希方法取得 key 的 hash 值；

（2）hash 值對桶數組長度取模，確定其所屬的桶；

（3）在桶中插入 key-value 對.

hash 的性質，保證了相同的 key 必然產生相同的 hash 值，因此能映射到相同的桶中，通過桶內遍歷的方式鎖定對應的 key-value 對.

因此，只要在宏觀流程上，控制每個桶中 key-value 對的數量，就能保證 map 的幾項操作都限制為常數級別的時間復雜度.

Hash

hash 譯作散列，是一種將任意長度的輸入壓縮到某一固定長度的輸出摘要的過程，由于這種轉換屬于壓縮映射，輸入空間遠大于輸出空間（這里的壓縮是將無限的輸入空間壓縮成有限的輸出域），因此不同輸入可能會映射成相同的輸出結果. 此外，hash在壓縮過程中會存在部分信息的遺失，因此這種映射關系具有不可逆的特質.

（1）hash 的可重入性：相同的 key，必然產生相同的 hash 值；

（2）hash 的離散性：只要兩個 key 不相同，不論其相似度的高低，產生的 hash 值會在整個輸出域內均勻地離散化；

（3）hash 的單向性：企圖通過 hash 值反向映射回 key 是無跡可尋的.

（4）hash 沖突：由于輸入域（key）無窮大，輸出域（hash 值）有限，因此必然存在不同 key 映射到相同 hash 值的情況，稱之為 hash 沖突.

可能會感到有點沖突，但（2）和（4）并不相矛盾，離散型好的哈希函數只是減少沖突的概率，并不能完全避免。

桶數組

map中，會通過長度為2的整數次冪的桶數組進行 key-value 對的存儲：

（1）每個桶固定可以存放8個key-value對

（2）倘若超過 8 個 key-value 對打到桶數組的同一個索引當中，此時會通過創建桶鏈表的方式來化解這一問題。

在這里插入圖片描述

1.：hash沖突不同的key，可能會存在相同的hash值

2：不同的hash值通過對桶長度進行取模之后，也有可能會被打到同一個桶中。

綜上面兩點：不同的 key-value 可能被映射到 map 的同一個桶當中。

拉鏈法解決hash沖突

拉鏈法中，將命中同一個桶的元素通過鏈表的形式進行鏈接，因此很便于動態擴展.

Go map 的做法（拉鏈法的優化）：

每個 bucket 固定容納 8 個 key-value
如果滿了，就通過 overflow 字段掛一個溢出桶
實際上是一個 結構化拉鏈法（struct-based chaining）

開放尋址法解決hash沖突

所有元素都存在 哈希表本身。如果發生沖突，就按照某種“探測序列”尋找下一個可用位置。

常見策略：

線性探測（index+1）
二次探測（index+1^2, +2^2…）
雙重哈希（用另一個哈希函數重新計算偏移）

方法	優點
拉鏈法	簡單常用；無需預先為元素分配內存。
開放尋址法	無需額外的指針用于鏈接元素；內存地址完全連續，可以基于局部性原理，充分利用CPU高速緩存。

Go 的 map 實現確實結合了 開放尋址法 和 拉鏈法 的思想，但它并不是標準意義上的鏈表拉鏈法，而是桶鏈表 + 定長數組 + 溢出桶機制的混合方案。

（1）每個桶是一個結構體（Go 源碼中為bmap），包含：

一個 tophash 數組（快速判斷 key 匹配），數組長度為8，也就是說一個桶中可以存儲8個Key-Value.

當桶滿了，不是在桶內繼續鏈表擴展 key-value，而是通過一個 溢出指針數組

指向額外的桶（overflow bucket）

所以Go的map實際上是：哈希桶數組+每桶最多存儲8個kv+桶裝滿后追加溢出桶形成“桶鏈表”

// A bucket for a Go map.
type bmap struct {// tophash generally contains the top byte of the hash value// for each key in this bucket. If tophash[0] < minTopHash,// tophash[0] is a bucket evacuation state instead.tophash [abi.OldMapBucketCount]uint8// Followed by bucketCnt keys and then bucketCnt elems.// NOTE: packing all the keys together and then all the elems together makes the// code a bit more complicated than alternating key/elem/key/elem/... but it allows// us to eliminate padding which would be needed for, e.g., map[int64]int8.// Followed by an overflow pointer.
}

（2）key 命中一個桶后，在桶的 8 個位置中尋找空位插入，這就是類似開放尋址法的本地桶內查找

（3）如果桶的 8 個位置都被占滿，則找下一個溢出桶，重復第（2）步

（4）如果遍歷所有溢出桶都沒有空位，則新建溢出桶并插入

Go map 擴容機制

為什么要擴容？

如果桶數組的長度一直保持不變，那么隨著key-value對的增長，當一個桶下掛載的key-value達到一定的量級，時間復雜度上升，無法滿足訴求。

因此，為了將操作的時間復雜度維持在 O(1)，map 會在滿足一定條件時觸發擴容，以控制每個桶的平均負載在常量級別。

map 擴容機制的核心點包括：

（1）擴容分為增量擴容和等量擴容；

（2）增量擴容：當 map 的負載因子（key 數量 / 桶數量）大于 6.5 時，會觸發增量擴容，將桶數組長度擴大為原值的兩倍。

（3) 等量擴容：當桶內溢出桶數量大于等于 2^B 時( B 為桶數組長度的指數，B 最大取 15)，發生等量擴容，桶的長度保持為原值；等量擴容用于處理 hash 分布不均（熱點 key）導致的溢出桶爆炸問題，此時不會改變桶數量，而是重新散列所有 key。

（4）采用漸進擴容的方式，當桶被實際操作到時，由使用者負責完成數據遷移，避免因為一次性的全量數據遷移引發性能抖動。漸進遷移通過“讀寫操作觸發搬遷”的方式，把遷移成本分攤到用戶的操作中，避免瞬時卡頓。

漸進擴容的核心流程

擴容時，Go map 并不會馬上把所有舊桶遷移完，而是：

分配一份新的 bucket 數組（oldbuckets 和 newbuckets 并存）；
設置 oldbuckets 指針指向舊桶；
在后續的 每次寫入操作（插入/刪除）中，順便“搬一兩個舊桶”的數據到新桶中；
遷移的數據會被重新計算 hash 值并分配到新桶；
同時維護一個 nevacuate 指針記錄遷移進度，當 nevacuate 指向了所有舊桶的末尾，說明所有桶都搬遷完畢。
當所有舊桶都遷移完畢后，map 會把舊桶釋放，指針 oldbuckets 清空。才切換為新桶數組。

在 map 漸進擴容過程中，如何判斷一個舊桶的 key 應該遷移到新桶的哪個桶中？

在 Go 的 map 中，擴容時新桶數量 = 舊桶數量 × 2（增量擴容）。

因此：一個舊桶中的 key-value 對只可能被遷移到兩個新桶中的一個。

判斷規則是：對于舊桶中每個 key，根據 hash 值重新計算位置，看是落到原桶 b，還是 b + oldBucketCount。

因為舊桶的索引是 h & (oldBucketCount - 1)，即取 hash 的低 B-1 位。

擴容后只是多了一位（第 B 位），也就是說：

如果這第 B 位是 0：落到原桶 b；
如果這第 B 位是 1：落到新桶 b + oldBucketCount。

Go map的桶（bucket）結構是：

每個桶可以存放 最多 8 個 key-value 對；
為了加快桶內查找效率，Go 把 每個 key 的 hash 值的“高 8 位” 存進一個 tophash 數組（長度為 8，對應每個 key）；
而 hash 值的 低若干位（具體取決于桶數量） 用于決定 key 屬于哪個桶。

數據結構

hmap

type hmap struct {// Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.// Make sure this stays in sync with the compiler's definition.count     int // # live cells == size of map.  Must be first (used by len() builtin)flags     uint8B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for detailshash0     uint32 // hash seedbuckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growingnevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)clearSeq   uint64extra *mapextra // optional fields
}

（1）count：map 中的 key-value 總數；

（2）flags：map 狀態標識，可以標識出 map 是否被 goroutine 并發讀寫；

（3）B：桶數組長度的指數，桶數組長度為 2^B；

（4）noverflow：map 中溢出桶的數量；

（5）hash0：hash 隨機因子，生成 key 的 hash 值時會使用到；

（6）buckets：桶數組；

（7）oldbuckets：擴容過程中老的桶數組；

（8）nevacuate：擴容時的進度標識，index 小于 nevacuate 的桶都已經由老桶轉移到新桶中；

（9）extra：預申請的溢出桶.

mapextra

這個mapextra是hmap中的一個“輔助字段”，專門用于管理溢出桶（overflow buckets）

type mapextra struct {// If both key and elem do not contain pointers and are inline, then we mark bucket// type as containing no pointers. This avoids scanning such maps.// However, bmap.overflow is a pointer. In order to keep overflow buckets// alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.// overflow and oldoverflow are only used if key and elem do not contain pointers.// overflow contains overflow buckets for hmap.buckets.// oldoverflow contains overflow buckets for hmap.oldbuckets.// The indirection allows to store a pointer to the slice in hiter.overflow    *[]*bmapoldoverflow *[]*bmap// nextOverflow holds a pointer to a free overflow bucket.nextOverflow *bmap
}

在 map 初始化時，倘若容量過大，會提前申請好一批溢出桶，以供后續使用，這部分溢出桶存放在 hmap.mapextra 當中：

（1）mapextra.overflow：供桶數組 buckets 使用的溢出桶；

（2）mapextra.oldoverFlow: 擴容流程中，供老桶數組 oldBuckets 使用的溢出桶；

（3）mapextra.nextOverflow：下一個可用的溢出桶.

Go 會需要一個“溢出桶”來存放新插入的數據，這時候它有一個“拿溢出桶”的優先順序：

優先從 mapextra.nextOverflow 取：這個是預先分配好的一批溢出桶
如果 nextOverflow 沒桶了，就：去 overflow[] 列表里找是否有空閑的桶（可能是之前用過又回收的）
如果都沒有，才會向 Go 的內存分配器（heap）重新申請一個新的溢出桶。

bmap

// A bucket for a Go map.
type bmap struct {// tophash generally contains the top byte of the hash value// for each key in this bucket. If tophash[0] < minTopHash,// tophash[0] is a bucket evacuation state instead.tophash [abi.OldMapBucketCount]uint8// Followed by bucketCnt keys and then bucketCnt elems.// NOTE: packing all the keys together and then all the elems together makes the// code a bit more complicated than alternating key/elem/key/elem/... but it allows// us to eliminate padding which would be needed for, e.g., map[int64]int8.// Followed by an overflow pointer.
}