?
一致性哈希算法,作為分布式計算的數據分配參考,比傳統的取模,劃段都好很多。
在電信計費中,可以作為多臺消息接口機和在線計費主機的分配算法,根據session_id來分配,這樣當計費主機動態伸縮的時候,因為session_id緩存缺失而需要放通的會話,會明顯減少。
?
傳統的取模方式
?
例如10條數據,3個節點,如果按照取模的方式,那就是
node a: 0,3,6,9
node b: 1,4,7
node c: 2,5,8
?
當增加一個節點的時候,數據分布就變更為
node a:0,4,8
node b:1,5,9
node c: 2,6
node d: 3,7
?
總結:數據3,4,5,6,7,8,9在增加節點的時候,都需要做搬遷,成本太高
?
一致性哈希方式
最關鍵的區別就是,對節點和數據,都做一次哈希運算,然后比較節點和數據的哈希值,數據取和節點最相近的節點做為存放節點。這樣就保證當節點增加或者減少的時候,影響的數據最少。
還是拿剛剛的例子,(用簡單的字符串的ascii碼做哈希key):
十條數據,算出各自的哈希值
0:192
1:196
2:200
3:204
4:208
5:212
6:216
7:220
8:224
9:228
?
有三個節點,算出各自的哈希值
node a: 203
node g: 209
node z: 228
?
這個時候比較兩者的哈希值,如果大于228,就歸到前面的203,相當于整個哈希值就是一個環,對應的映射結果:
node a: 0,1,2
node g: 3,4
node z: 5,6,7,8,9
?
這個時候加入node n, 就可以算出node n的哈希值:
node n: 216
?
這個時候對應的數據就會做遷移:
node a: 0,1,2
node g: 3,4
node n: 5,6
node z: 7,8,9
?
這個時候只有5和6需要做遷移
另外,這個時候如果只算出三個哈希值,那再跟數據的哈希值比較的時候,很容易分得不均衡,因此就引入了虛擬節點的概念,通過把三個節點加上ID后綴等方式,每個節點算出n個哈希值,均勻的放在哈希環上,這樣對于數據算出的哈希值,能夠比較散列的分布(詳見下面代碼中的replica)
?
通過這種算法做數據分布,在增減節點的時候,可以大大減少數據的遷移規模。
?
下面轉載的哈希代碼,已經將gen_key改成上述描述的用字符串ascii相加的方式,便于測試驗證。
?
?
-
import md5
-
class HashRing(object):
-
def __init__(self, nodes=None, replicas=3):
-
"""Manages a hash ring.
-
`nodes` is a list of objects that have a proper __str__ representation.
-
`replicas` indicates how many virtual points should be used pr. node,
-
replicas are required to improve the distribution.
-
"""
-
self.replicas = replicas
-
self.ring = dict()
-
self._sorted_keys = []
-
if nodes:
-
for node in nodes:
-
self.add_node(node)
-
def add_node(self, node):
-
"""Adds a `node` to the hash ring (including a number of replicas).
-
"""
-
for i in xrange(0, self.replicas):
-
key = self.gen_key('%s:%s' % (node, i))
-
print "node %s-%s key is %ld" % (node, i, key)
-
self.ring[key] = node
-
self._sorted_keys.append(key)
-
self._sorted_keys.sort()
-
def remove_node(self, node):
-
"""Removes `node` from the hash ring and its replicas.
-
"""
-
for i in xrange(0, self.replicas):
-
key = self.gen_key('%s:%s' % (node, i))
-
del self.ring[key]
-
self._sorted_keys.remove(key)
-
def get_node(self, string_key):
-
"""Given a string key a corresponding node in the hash ring is returned.
-
If the hash ring is empty, `None` is returned.
-
"""
-
return self.get_node_pos(string_key)[0]
-
def get_node_pos(self, string_key):
-
"""Given a string key a corresponding node in the hash ring is returned
-
along with it's position in the ring.
-
If the hash ring is empty, (`None`, `None`) is returned.
-
"""
-
if not self.ring:
-
return None, None
-
key = self.gen_key(string_key)
-
nodes = self._sorted_keys
-
for i in xrange(0, len(nodes)):
-
node = nodes[i]
-
if key <= node:
-
print "string_key %s key %ld" % (string_key, key)
-
print "get node %s-%d " % (self.ring[node], i)
-
return self.ring[node], i
-
return self.ring[nodes[0]], 0
-
def print_ring(self):
-
if not self.ring:
-
return None, None
-
nodes = self._sorted_keys
-
for i in xrange(0, len(nodes)):
-
node = nodes[i]
-
print "ring slot %d is node %s, hash vale is %s" % (i, self.ring[node], node)
-
def get_nodes(self, string_key):
-
"""Given a string key it returns the nodes as a generator that can hold the key.
-
The generator is never ending and iterates through the ring
-
starting at the correct position.
-
"""
-
if not self.ring:
-
yield None, None
-
node, pos = self.get_node_pos(string_key)
-
for key in self._sorted_keys[pos:]:
-
yield self.ring[key]
-
while True:
-
for key in self._sorted_keys:
-
yield self.ring[key]
-
def gen_key(self, key):
-
"""Given a string key it returns a long value,
-
this long value represents a place on the hash ring.
-
md5 is currently used because it mixes well.
-
"""
-
m = md5.new()
-
m.update(key)
-
return long(m.hexdigest(), 16)
-
"""
-
hash = 0
-
for i in xrange(0, len(key)):
-
hash += ord(key[i])
-
return hash
-
"""
- ?
- ?
-
memcache_servers = ['a',
-
'g',
-
'z']
-
ring = HashRing(memcache_servers,1)
-
ring.print_ring()
-
server = ring.get_node('0000')
-
server = ring.get_node('1111')
-
server = ring.get_node('2222')
-
server = ring.get_node('3333')
-
server = ring.get_node('4444')
-
server = ring.get_node('5555')
-
server = ring.get_node('6666')
-
server = ring.get_node('7777')
-
server = ring.get_node('8888')
-
server = ring.get_node('9999')
- ?
-
print '----------------------------------------------------------'
- ?
-
memcache_servers = ['a',
-
'g',
-
'n',
-
'z']
-
ring = HashRing(memcache_servers,1)
-
ring.print_ring()
-
server = ring.get_node('0000')
-
server = ring.get_node('1111')
-
server = ring.get_node('2222')
-
server = ring.get_node('3333')
-
server = ring.get_node('4444')
-
server = ring.get_node('5555')
-
server = ring.get_node('6666')
-
server = ring.get_node('7777')
-
server = ring.get_node('8888')
-
server = ring.get_node('9999')
?