Golang map

Implementation Principle of map#

Go language uses a hash lookup table and uses linked lists to resolve hash conflicts.

Memory model:

type hmap struct{
   count        int     // number of K,V pairs. len() returns this value
   flags        uint8   // flag, whether there are other goroutines executing write operations
   B            unit8   // log of the number of buckets  2^B
   noverflow    uint16  // approximate number of overflow buckets used
   hash0        uint32    
   buckets      unsafe.Pointer  // pointer to buckets
   oldbuckets   unsafe.Pointer  // old buckets, when the length of buckets is equal to oldbuckets during equal expansion, the length of buckets will be twice that of oldbuckets during double expansion
   nevacuate    uintptr     // next old bucket number to be migrated during expansion
   extra        *mapextra   // information about overflow buckets

mapextra {
    // overflow[0] contains overflow buckets for hmap.buckets.
	// overflow[1] contains overflow buckets for hmap.oldbuckets.
	overflow        [2]*[]*bmap
    nextoverflow    *bmap

buckets is a pointer that points to a structure:

type bmap struct{
   tophash [bucketCnt]uint8 

During compilation, a new structure is dynamically created:

type bmap struct{
    topbits     [8]int8
    keys        [8]keytype
    values      [8]valuetype
    pad         uintptr
    overflow    uintptr

bmap: the bucket we often talk about, which can hold up to 8 keys. The high 8 bits of the hash value calculated based on the key determine which position the key falls into in the bucket (a bucket can have up to 8 positions).

When both the key and value of the map are not pointers, and the size is less than 128 bytes, the bmap is marked as containing no pointers, which avoids scanning the entire hmap during garbage collection.
However, as we can see, bmap actually has an overflow field, which is of pointer type, which breaks the assumption that bmap does not contain pointers. In this case, the overflow is moved to the extra field.

What is the difference between slice and map as function parameters?

When a map and a slice are used as function parameters, operations on the map inside the function will affect the map itself, while operations on the slice will not.

The main reason is that one is a pointer (*hmap) and the other is a structure (slice).
In Go language, function parameters are passed by value, and inside the function, the parameters are copied to the local variables.
After the pointer *hmap is copied, it still points to the same map, so operations on the map inside the function will affect the original argument.
However, after the slice is copied, it becomes a new slice, and operations on it will not affect the original argument.

Hash Function#

One key point of a map is the choice of the hash function. At program startup, the CPU is checked to see if it supports AES. If it does, AES hash is used; otherwise, memhash is used. This is done in the alginit() function, located at src/runtime/alg.go.
The choice of the hash function mainly considers two factors: performance and collision probability.

Key Positioning#

The number of buckets m = 2^B
Modulo: hash%m, but the overhead is too large, so bitwise operations are used in the code, hash&(m-1)

Resolution of hash conflicts:
"Chaining": create a new bucket behind the conflicting bucket
Open addressing: traverse the buckets, check if the key is equal or nil

map Traversal Process#

The expansion process is not an atomic operation. At most, only two buckets are moved at a time. If the expansion operation is triggered, for a long period of time, the state of the map will be in an intermediate state: some buckets have been moved to new locations, while others are still in their old locations.
The difficulty lies in: if the traversal occurs during the expansion process, it will involve the process of traversing the new and old buckets.

View assembly commands:

go tool compile -S main.go

// ......
0x0124 00292 (test16.go:9)      CALL    runtime.mapiterinit(SB)

// ......
0x01fb 00507 (test16.go:9)      CALL    runtime.mapiternext(SB)
0x0200 00512 (test16.go:9)      MOVQ    ""..autotmp_4+160(SP), AX
0x0208 00520 (test16.go:9)      TESTQ   AX, AX
0x020b 00523 (test16.go:9)      JNE     302

First, the mapiterinit function is called to initialize the iterator (hiter), and then the mapiternext function is called repeatedly to iterate over the map.

// Generate a random number r
r := uintptr(fastrand())
if h.B > 31-bucketCntBits {
	r += uintptr(fastrand()) << 31

// Start the traversal from which bucket
it.startBucket = r & (uintptr(1)<<h.B - 1)
// Start the traversal from which cell in the bucket
it.offset = uint8(r >> h.B & (bucketCnt - 1))

Assignment Process#

mapassign function
Check the flags flag

Calculate the hash value of the key, and according to the previous process, find the position to be assigned (which may be inserting a new key or updating an old key), and assign the value to the corresponding position.
The source code is similar to what we discussed before, the core is still a double loop, the outer loop traverses the bucket and its overflow bucket, and the inner loop traverses the various cells of the entire bucket.

Prepare two pointers, one (inserti) points to the position of the hash value of the key in the tophash array, and the other (insertk) points to the position of the cell (which is the address where the key is finally placed). Of course, the position of the value can be easily determined.
These three are actually related. The index position in the tophash array determines the position of the key in the entire bucket (8 keys in total), and the position of the value needs to "cross" the length of 8 keys.
In the process of the loop, inserti and insertk respectively point to the first empty cell found.
If the key is not found in the map, that is, the key does not exist in the original map, it means inserting a new key. The final placement address of the key is the first "vacant" position discovered (tophash is empty).
If all 8 keys in this bucket are already full, after exiting the loop, it is found that inserti and insertk are both empty. At this time, an overflow bucket needs to be attached to the back of the bucket.

Before the key is officially placed, the state of the map needs to be checked to see if it needs to be expanded. If the expansion conditions are met, a expansion operation is triggered proactively.

After that, the entire process of finding and locating the key needs to be repeated. Because after the expansion, the distribution of keys has changed.
Finally, the values related to the map will be updated. If a new key is inserted, the count field of the map element will be increased by 1; the hashWriting write flag set at the beginning of the function will be cleared.

ps: The pointer returned by the mapassign function points to the position of the value corresponding to the key, and with the address, it is easy to perform the assignment.

Deletion Process#

First, check the flags flag to see if there are other goroutines performing write operations. If so, return panic directly.
Calculate the hash of the key and find the bucket it falls into. Check if the map is in the expansion process and perform a migration operation if necessary.
Find the specific position of the key, and traverse each cell in the bucket to find the corresponding position. After finding it, clear the key and value.
Finally, decrement the count field of the hamp interface by 1 and set the corresponding position of the tophash to empty.

Expansion Process#

When is expansion needed:

  1. Load factor > 6.5. locadFactor := count / (2^B)
  2. Too many overflow buckets: B<15, count > 2^B; B>15, count > 2^15

hashGrow() allocates new buckets and hangs the old buckets on the oldbuckets field.
The actual action of moving buckets is in the growWork() function, and the action of calling the growWork() function is in the mapassign and mapdelete functions.
olddbuckets is nil when the expansion is completed.

Why are the keys unordered?#

After the map is expanded, the keys will be moved. Keys that originally fell into the same bucket will have to fly far away after the move (the bucket number is increased by 2^B).
During the traversal, the buckets are traversed in order, and the keys in the bucket are also traversed in order. After the move, the positions of the keys have changed significantly, and some keys have flown high, while others have remained in place. As a result, the result of traversing the map is unlikely to be in the original order.

When traversing the map, we do not start from bucket 0 every time, but start from a random bucket number, and start traversing from a random cell number in this bucket. Even if you have a fixed map and just traverse it, it is unlikely to return a fixed sequence of key/value pairs.

ps: The feature "the result of iterating the map is unordered" was added starting from Go 1.0.

Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.