https://en.wikipedia.org/wiki/Consistent_hashing .
To Run python3.7 -m test
What is normal hashing ?
Let's say we have to store the following key value pair in a distributed memory store like redis.
Id | Name |
---|---|
Id-1 | Smith |
Id-2 | Jackson |
Id-3 | Thomas |
Id-4 | Tom |
Id-5 | Jerry |
Let say we have a hash function f(id) ,that takes above ids and creates hashes from it .
Assume we have 3 servers - (s1 , s2 and s3)
We can do a modulo of hash by the no of servers ie 3 , to map each each key to a server and we are left with following.
Id | Name | Hash f(Id) | Server(Hash % No of servers) |
---|---|---|---|
Id-1 | Smith | 1123 | node-1 |
Id-2 | Jackson | 1211 | node-2 |
Id-3 | Thomas | 1600 | node-1 |
Id-4 | Tom | 1801 | node-1 |
Id-5 | Jerry | 1788 | node-0 |
This looks perfecto , yea close but not cigar !
But What if a server say node-1 went down ?
Applying the same formula ie f(id)%(no of servers) for user Jackson, 1211%2 = 1
ie we got node-1 when the actual key is hashed to node-2 from the above table .
We could do remapping here , What if we have a billion keys ,in that case we have to remap a large no of keys which is tedious :(
This is a major flow in traditional hashing technique.
What is Consistent Hashing ?
In Consistent hashing , we visualize list of all nodes in a circular ring .(Basically a sorted array)
start func
For each node:
Find f(node) where f is the hash function
Append each f(node) to a sorted array
For any key
Compute the hash f(key)
Find the first f(node)>f(key)
map it
end func
for example, if we have to hash key smith, we compute the hash value 1123 , find the immediate node having hash value > 1123 ie node 3 with hash value 1500
Now , What if we loose a server , say we loose node-2 , All the keys can be mapped to next server node-3 :)
Yea , we only have to remap the keys of node-2