mysql存儲map數據結構

Go map實現原理 - 戀戀美食的個人空間 - OSCHINA - 中文開源技術交流社區 https://my.oschina.net/renhc/blog/2208417

// A header for a Go map.

type hmap struct {

// Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.

// Make sure this stays in sync with the compiler's definition.

count int // # live cells == size of map. Must be first (used by len() builtin)

flags uint8

B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items)

noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details

hash0 uint32 // hash seed

buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.

oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing

nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated)

extra *mapextra // optional fields

}

Golang的map使用哈希表作為底層實現，一個哈希表里可以有多個哈希表節點，也即bucket，而每個bucket就保存了map中的一個或一組鍵值對。

map數據結構由runtime/map.go:hmap定義:

type hmapstruct{

countint//當前保存的元素個數

...

B???????? uint8

...

bucketsunsafe.Pointer// bucket數組指針，數組的大小為2^B

...

}

下圖展示一個擁有4個bucket的map：

本例中,?hmap.B=2，而hmap.buckets長度是2^B為4. 元素經過哈希運算后會落到某個bucket中進行存儲。查找過程類似。

bucket很多時候被翻譯為桶，所謂的哈希桶實際上就是bucket。

2. bucket數據結構

bucket數據結構由runtime/map.go:bmap定義：

type bmapstruct{

tophash [8]uint8//存儲哈希值的高8位

databyte[1]//key value數據:key/key/key/.../value/value/value...

overflow *bmap//溢出bucket的地址

}

每個bucket可以存儲8個鍵值對。

tophash是個長度為8的數組，哈希值相同的鍵(準確的說是哈希值低位相同的鍵)存入當前bucket時會將哈希值的高位存儲在該數組中，以方便后續匹配。

data區存放的是key-value數據，存放順序是key/key/key/…value/value/value，如此存放是為了節省字節對齊帶來的空間浪費。

overflow 指針指向的是下一個bucket，據此將所有沖突的鍵連接起來。

注意：上述中data和overflow并不是在結構體中顯示定義的，而是直接通過指針運算進行訪問的。

下圖展示bucket存放8個key-value對：

package runtime

// This file contains the implementation of Go's map type.

// A map is just a hash table. The data is arranged

// into an array of buckets. Each bucket contains up to

// 8 key/elem pairs. The low-order bits of the hash are

// used to select a bucket. Each bucket contains a few

// high-order bits of each hash to distinguish the entries

// within a single bucket.

// If more than 8 keys hash to a bucket, we chain on

// extra buckets.

// When the hashtable grows, we allocate a new array

// of buckets twice as big. Buckets are incrementally

// copied from the old bucket array to the new bucket array.

// Map iterators walk through the array of buckets and

// return the keys in walk order (bucket #, then overflow

// chain order, then bucket index). To maintain iteration

// semantics, we never move keys within their bucket (if

// we did, keys might be returned 0 or 2 times). When

// growing the table, iterators remain iterating through the

// old table and must check the new table if the bucket

// they are iterating through has been moved ("evacuated")

// to the new table.

// Picking loadFactor: too large and we have lots of overflow

// buckets, too small and we waste a lot of space. I wrote

// a simple program to check some stats for different loads:

// (64-bit, 8 byte keys and elems)

// loadFactor %overflow bytes/entry hitprobe missprobe

// 4.00 2.13 20.77 3.00 4.00

// 4.50 4.05 17.30 3.25 4.50

// 5.00 6.85 14.77 3.50 5.00

// 5.50 10.55 12.94 3.75 5.50

// 6.00 15.27 11.67 4.00 6.00

// 6.50 20.90 10.79 4.25 6.50

// 7.00 27.14 10.15 4.50 7.00

// 7.50 34.03 9.73 4.75 7.50

// 8.00 41.10 9.40 5.00 8.00

// %overflow = percentage of buckets which have an overflow bucket

// bytes/entry = overhead bytes used per key/elem pair

// hitprobe = # of entries to check when looking up a present key

// missprobe = # of entries to check when looking up an absent key

// Keep in mind this data is for maximally loaded tables, i.e. just

// before the table grows. Typical tables will be somewhat less loaded.

import (

"runtime/internal/atomic"

"runtime/internal/math"

"runtime/internal/sys"

"unsafe"

)

const (

// Maximum number of key/elem pairs a bucket can hold.

bucketCntBits = 3

bucketCnt = 1 << bucketCntBits

// Maximum average load of a bucket that triggers growth is 6.5.

// Represent as loadFactorNum/loadFactorDen, to allow integer math.

loadFactorNum = 13

loadFactorDen = 2

// Maximum key or elem size to keep inline (instead of mallocing per element).

// Must fit in a uint8.

// Fast versions cannot handle big elems - the cutoff size for

// fast versions in cmd/compile/internal/gc/walk.go must be at most this elem.

maxKeySize = 128

maxElemSize = 128

// data offset should be the size of the bmap struct, but needs to be

// aligned correctly. For amd64p32 this means 64-bit alignment

// even though pointers are 32 bit.

dataOffset = unsafe.Offsetof(struct {

b bmap

v int64

}{}.v)

// Possible tophash values. We reserve a few possibilities for special marks.

// Each bucket (including its overflow buckets, if any) will have either all or none of its

// entries in the evacuated* states (except during the evacuate() method, which only happens

// during map writes and thus no one else can observe the map during that time).

emptyRest = 0 // this cell is empty, and there are no more non-empty cells at higher indexes or overflows.

emptyOne = 1 // this cell is empty

evacuatedX = 2 // key/elem is valid. Entry has been evacuated to first half of larger table.

evacuatedY = 3 // same as above, but evacuated to second half of larger table.

evacuatedEmpty = 4 // cell is empty, bucket is evacuated.

minTopHash = 5 // minimum tophash for a normal filled cell.

// flags

iterator = 1 // there may be an iterator using buckets

oldIterator = 2 // there may be an iterator using oldbuckets

hashWriting = 4 // a goroutine is writing to the map

sameSizeGrow = 8 // the current map growth is to a new map of the same size

// sentinel bucket ID for iterator checks

noCheck = 1<

)

3. 哈希沖突

當有兩個或以上數量的鍵被哈希到了同一個bucket時，我們稱這些鍵發生了沖突。Go使用鏈地址法來解決鍵沖突。由于每個bucket可以存放8個鍵值對，所以同一個bucket存放超過8個鍵值對時就會再創建一個鍵值對，用類似鏈表的方式將bucket連接起來。

下圖展示產生沖突后的map：

bucket數據結構指示下一個bucket的指針稱為overflow bucket，意為當前bucket盛不下而溢出的部分。事實上哈希沖突并不是好事情，它降低了存取效率，好的哈希算法可以保證哈希值的隨機性，但沖突過多也是要控制的，后面會再詳細介紹。

4. 負載因子

負載因子用于衡量一個哈希表沖突情況，公式為：

負載因子 = 鍵數量/bucket數量

例如，對于一個bucket數量為4，包含4個鍵值對的哈希表來說，這個哈希表的負載因子為1.

哈希表需要將負載因子控制在合適的大小，超過其閥值需要進行rehash，也即鍵值對重新組織：

哈希因子過小，說明空間利用率低

哈希因子過大，說明沖突嚴重，存取效率低

每個哈希表的實現對負載因子容忍程度不同，比如Redis實現中負載因子大于1時就會觸發rehash，而Go則在在負載因子達到6.5時才會觸發rehash，因為Redis的每個bucket只能存1個鍵值對，而Go的bucket可能存8個鍵值對，所以Go可以容忍更高的負載因子。

5. 漸進式擴容

5.1 擴容的前提條件

為了保證訪問效率，當新元素將要添加進map時，都會檢查是否需要擴容，擴容實際上是以空間換時間的手段。

觸發擴容的條件有二個：

1.????? 負載因子 > 6.5時，也即平均每個bucket存儲的鍵值對達到6.5個。

2.????? overflow數量 > 2^15時，也即overflow數量超過32768時。

5.2 增量擴容

當負載因子過大時，就新建一個bucket，新的bucket長度是原來的2倍，然后舊bucket數據搬遷到新的bucket。

考慮到如果map存儲了數以億計的key-value，一次性搬遷將會造成比較大的延時，Go采用逐步搬遷策略，即每次訪問map時都會觸發一次搬遷，每次搬遷2個鍵值對。

下圖展示了包含一個bucket滿載的map(為了描述方便，圖中bucket省略了value區域):

當前map存儲了7個鍵值對，只有1個bucket。此地負載因子為7。再次插入數據時將會觸發擴容操作，擴容之后再將新插入鍵寫入新的bucket。

當第8個鍵值對插入時，將會觸發擴容，擴容后示意圖如下：

hmap數據結構中oldbuckets成員指身原bucket，而buckets指向了新申請的bucket。新的鍵值對被插入新的bucket中。后續對map的訪問操作會觸發遷移，將oldbuckets中的鍵值對逐步的搬遷過來。當oldbuckets中的鍵值對全部搬遷完畢后，刪除oldbuckets。

搬遷完成后的示意圖如下：

數據搬遷過程中原bucket中的鍵值對將存在于新bucket的前面，新插入的鍵值對將存在于新bucket的后面。實際搬遷過程中比較復雜，將在后續源碼分析中詳細介紹。

5.3 等量擴容

所謂等量擴容，實際上并不是擴大容量，buckets數量不變，重新做一遍類似增量擴容的搬遷動作，把松散的鍵值對重新排列一次，以使bucket的使用率更高，進而保證更快的存取。在極端場景下，比如不斷地增刪，而鍵值對正好集中在一小部分的bucket，這樣會造成overflow的bucket數量增多，但負載因子又不高，從而無法執行增量搬遷的情況，如下圖所示：