Is multiarm bandit a choice when there is very low reward

65 views Asked by At

Is any version of multiarm bandit (EpsilonGreedy, Thompson Sampling, UCB) any good when there is very low reward/click rate for the high pull rate. I have 600 piece of content with approximately 3000 clicks (total across all content) per day for a volume of approximately million requests. With this would it be useful to implement MAB, is this rate of click any statistical significance for the algorithm.

1

There are 1 answers

0
Sanit On

Do the 600 pieces of content change every day or do they stay the same? If they stay the same, then an asymptotically optimal algorithm would start performing extremely well soon enough.

Even if the pieces of content change, Thompson Sampling should still work and give you something which significantly better than random. I have run various experiments with Thompson Sampling for my research and it seems to start doing well very quickly on most of them.