I use the following line to update my beta distribution in each trial and give arm recommendation (I use scipy.stats.beta) :
self.prior = (1.0,1.0)
def get_recommendation(self):
sampled_theta = []
for i in range(self.arms):
#Construct beta distribution for posterior
dist = beta(self.prior[0]+self.successes[i],
self.prior[1]+self.trials[i]-self.successes[i])
#Draw sample from beta distribution
sampled_theta += [ dist.rvs() ]
# Return the index of the sample with the largest value
return sampled_theta.index( max(sampled_theta) )
But currently, it only works in the rewards are binary (either it's success or failure). I want to modify it so it works for non-binary rewards. (e.g. rewards: 2300, 2000,...). How do I do that?