Implement a load balancer for web servers. It provide the following functionality:
Add a new server to the cluster => add(server_id).
Remove a bad server from the cluster => remove(server_id).
Pick a server in the cluster randomly with equal probability => pick().
Motivations:
Finite memory of server
Load Balancers can provide a service that has no downtime and better maintenance.
For example, say you needed to update your website. A load balancer allows you to do this without shutting down your entire system.
Handle and sustain more total connections (as the demand for the product increases).
Maximize resources, and minimize response time.
Further Possible Designs (scheduling algorithms):
Round Robin
Fewest number of connections
Fewest current load on the basis of...
Current CPU occupation
Current memory occupation
Randomized algorithm:
Uniform selection, or fit a beta distribution that best models the constraints of your system.
Weighed Sampling on the basis of certain performance metrics of the server like memory capacity or CPU performance.
Other Issues:
Requests made by same user should be cached to the same server Id.
Complexity: O(1) time for all operations, O(N) space
Implementation 1
import random# randomized load balancerclassLoadBalancer1:def__init__(self): self.servers = [] self.server_hash ={}defadd(self,id): self.server_hash[id]=len(self.servers) self.servers.append(id)defremove(self,id):ifidnotin self.server_hash:raiseValueError("Server ID does not exist") index = self.server_hash[id]# swap for O(1) removal (order doesnt matter) self.servers[-1], self.servers[index]=\ self.servers[index], self.servers[-1]# reflect the changes onto the dictionary self.server_hash[len(self.servers)]= index# remove from end of list and dictionary self.servers.pop()del self.server_hash[id]defselect(self):return random.choice(self.servers)# sanity checkingdeftest1(removals): random.seed(42) N_SERVERS =10 obj =LoadBalancer1()print("adding")for i inrange(0, N_SERVERS): obj.add(i)print("selecting")for i inrange(0, N_SERVERS):print(obj.select())print("removing: ", removals)for i in removals: obj.remove(i)print("selecting")for i inrange(0, N_SERVERS):print(obj.select())test1([1, 3, 4, 6, 9])test1(reversed([1, 3, 4, 6, 9]))
Complexity: O(1) time for add, removal, but O(n) for select -> which potentially might be unscalable depending on the constraints of the system design.
Implementation 2
# weighted load balancerclassLoadBalancer2:def__init__(self): self.cum_sum =0 self.servers = [] self.weights = []# sole purpose is to hash into the list self.server_hash ={}defadd(self,id,weight):assert(weight <=1) self.server_hash[id]=len(self.servers) self.weights.append(weight) self.servers.append(id) self.cum_sum += weightdefremove(self,id):ifidnotin self.server_hash:raiseValueError("Server ID does not exist") index = self.server_hash[id] self.cum_sum -= self.weights[index]# swap for O(1) removal (order doesnt matter) self.servers[-1], self.servers[index]=\ self.servers[index], self.servers[-1] self.weights[-1], self.weights[index]=\ self.weights[index], self.weights[-1]# reflect the changes onto the dictionary self.server_hash[len(self.servers)]= index# remove from end of list and dictionary self.servers.pop() self.weights.pop()del self.server_hash[id]defselect(self): pmf = [weight/self.cum_sum for weight in self.weights] cdf = [sum(pmf[:i+1])for i, x inenumerate(pmf)] rnd = random.uniform(0, 1)# start from 0if0<= rnd <= cdf[0]:return self.servers[0]# now initiate the restfor i inrange(1, len(cdf)):if cdf[i-1]<= rnd <= cdf[i]:return self.servers[i]print("Shouldn't get here")# sanity checkingdeftest2(removals): random.seed(42) N_SERVERS =10 obj =LoadBalancer2()print("adding")for i inrange(0, N_SERVERS): obj.add(i, random.uniform(0, 1))print("selecting")for i inrange(0, N_SERVERS):print(obj.select())print("removing: ", removals)for i in removals: obj.remove(i)print("selecting")for i inrange(0, N_SERVERS):print(obj.select())test2([1, 3, 4, 6, 9])test2(reversed([1, 3, 4, 6, 9]))
The Idea: We can use a combination of a list and hash table to produce constant time operations. The hash table will solely used for identification purposes when identifying the server within the list. In order to obtain constant time removals, I've drawn the diagram below. Essentially, we first the element we want to remove to the end of the list, and then pop from that list. To reflect these changes in the hash table, the swapped element copies the value of the deleted element (since it was swapped).
This next implementation considers a weighted version of each server. Servers that are able to maintain and serve a greater load by for example, having greater memory capacities, or more powerful CPU requirements; might be given a greater weight to increase the probability of becoming selected. I've drawn a diagram below to describe how weighted random sampling works. It essentially comes down to first building the cumulative distribution array from the probability mass function (which are just a normalized scaling of the weights), and then randomly selecting a number between 0 and 1, and returning that interval in the cdf is responsible in containg this.