Comment on page
Load Balancer
Implement a load balancer for web servers. It provide the following functionality:
Add a new server to the cluster =>
add(server_id)
.Remove a bad server from the cluster =>
remove(server_id)
.Pick a server in the cluster randomly with equal probability =>
pick()
.Motivations:
- Finite memory of server
- Load Balancers can provide a service that has no downtime and better maintenance.
- For example, say you needed to update your website. A load balancer allows you to do this without shutting down your entire system.
- Handle and sustain more total connections (as the demand for the product increases).
- Maximize resources, and minimize response time.
Further Possible Designs (scheduling algorithms):
- Round Robin
- Fewest number of connections
- Fewest current load on the basis of...
- Current CPU occupation
- Current memory occupation
- Randomized algorithm:
- Uniform selection, or fit a beta distribution that best models the constraints of your system.
- Weighed Sampling on the basis of certain performance metrics of the server like memory capacity or CPU performance.
Other Issues:
- Requests made by same user should be cached to the same server Id.
The Idea: We can use a combination of a list and hash table to produce constant time operations. The hash table will solely used for identification purposes when identifying the server within the list. In order to obtain constant time removals, I've drawn the diagram below. Essentially, we first the element we want to remove to the end of the list, and then pop from that list. To reflect these changes in the hash table, the swapped element copies the value of the deleted element (since it was swapped).
_removal_swap.png?generation=1568003901721638&alt=media)
Complexity: O(1) time for all operations, O(N) space
Implementation 1
import random
# randomized load balancer
class LoadBalancer1:
def __init__(self):
self.servers = []
self.server_hash = {}
def add(self, id):
self.server_hash[id] = len(self.servers)
self.servers.append(id)
def remove(self, id):
if id not in self.server_hash:
raise ValueError("Server ID does not exist")
index = self.server_hash[id]
# swap for O(1) removal (order doesnt matter)
self.servers[-1], self.servers[index] = \
self.servers[index], self.servers[-1]
# reflect the changes onto the dictionary
self.server_hash[len(self.servers)] = index
# remove from end of list and dictionary
self.servers.pop()
del self.server_hash[id]
def select(self):
return random.choice(self.servers)
# sanity checking
def test1(removals):
random.seed(42)
N_SERVERS = 10
obj = LoadBalancer1()
print("adding")
for i in range(0, N_SERVERS):
obj.add(i)
print("selecting")
for i in range(0, N_SERVERS):
print(obj.select())
print("removing: ", removals)
for i in removals:
obj.remove(i)
print("selecting")
for i in range(0, N_SERVERS):
print(obj.select())
test1([1, 3, 4, 6, 9])
test1(reversed([1, 3, 4, 6, 9]))
This next implementation considers a weighted version of each server. Servers that are able to maintain and serve a greater load by for example, having greater memory capacities, or more powerful CPU requirements; might be given a greater weight to increase the probability of becoming selected. I've drawn a diagram below to describe how weighted random sampling works. It essentially comes down to first building the cumulative distribution array from the probability mass function (which are just a normalized scaling of the weights), and then randomly selecting a number between 0 and 1, and returning that interval in the cdf is responsible in containg this.

Complexity: O(1) time for add, removal, but O(n) for select -> which potentially might be unscalable depending on the constraints of the system design.
Implementation 2
# weighted load balancer
class LoadBalancer2:
def __init__(self):
self.cum_sum = 0
self.servers = []
self.weights = []
# sole purpose is to hash into the list
self.server_hash = {}
def add(self, id, weight):
assert(weight <= 1)
self.server_hash[id] = len(self.servers)
self.weights.append(weight)
self.servers.append(id)
self.cum_sum += weight
def remove(self, id):
if id not in self.server_hash:
raise ValueError("Server ID does not exist")
index = self.server_hash[id]
self.cum_sum -= self.weights[index]
# swap for O(1) removal (order doesnt matter)
self.servers[-1], self.servers[index] = \
self.servers[index], self.servers[-1]
self.weights[-1], self.weights[index] = \
self.weights[index], self.weights[-1]
# reflect the changes onto the dictionary
self.server_hash[len(self.servers)] = index
# remove from end of list and dictionary
self.servers.pop()
self.weights.pop()
del self.server_hash[id]
def select(self):
pmf = [weight/self.cum_sum for weight in self.weights]
cdf = [sum(pmf[:i+1]) for i, x in enumerate(pmf)]
rnd = random.uniform(0, 1)
# start from 0
if 0 <= rnd <= cdf[0]:
return self.servers[0]
# now initiate the rest
for i in range(1, len(cdf)):
if cdf[i-1] <= rnd <= cdf[i]:
return self.servers[i]
print("Shouldn't get here")
# sanity checking
def test2(removals):
random.seed(42)
N_SERVERS = 10
obj = LoadBalancer2()
print("adding")
for i in range(0, N_SERVERS):
obj.add(i, random.uniform(0, 1))
print("selecting")
for i in range(0, N_SERVERS):
print(obj.select())
print("removing: ", removals)
for i in removals:
obj.remove(i)
print("selecting")
for i in range(0, N_SERVERS):
print(obj.select())
test2([1, 3, 4, 6, 9])
test2(reversed([1, 3, 4, 6, 9]))
Last modified 4yr ago