Is multiarm bandit a choice when there is very low reward

Question

Is multiarm bandit a choice when there is very low reward

55 views Asked by Aravind Chamakura At 11 December 2018 at 07:58

Is any version of multiarm bandit (EpsilonGreedy, Thompson Sampling, UCB) any good when there is very low reward/click rate for the high pull rate. I have 600 piece of content with approximately 3000 clicks (total across all content) per day for a volume of approximately million requests. With this would it be useful to implement MAB, is this rate of click any statistical significance for the algorithm.

Original Q&A

There are 1 answers

**Sanit** · Answer 1 · 2020-02-10T07:32:27+00:00

Do the 600 pieces of content change every day or do they stay the same? If they stay the same, then an asymptotically optimal algorithm would start performing extremely well soon enough.

Even if the pieces of content change, Thompson Sampling should still work and give you something which significantly better than random. I have run various experiments with Thompson Sampling for my research and it seems to start doing well very quickly on most of them.

TechQA.

Is multiarm bandit a choice when there is very low reward

There are 1 answers

Related Questions in MACHINE-LEARNING

Related Questions in EPSILON

Related Questions in BANDIT

Popular Questions

Popular Tags

Trending Questions