Using Reinforecement Learning or Not? How to solve specific optimization problem?

23 views Asked by At

I have an optimization problem for which I have no idea how to find a solution. The concept involves receiving a simple observation (binary) at each timestep, making decisions accordingly, and subsequently receiving another observation. After 1000 timesteps, I obtain a value indicating the effectiveness of the selected decisions. My objective is to determine the best policy of decisions to maximize the end value. What methods can I use to solve this problem?

While Reinforcement Learning seems logical here, it may struggle to recognize that actions at the beginning can have the same influence as those at the end.

I tried DRL algorithms but any of them cannot find good policy. I assume the problem with the sparse reward at the end. Any solution method as a recommendation is appreciated.

0

There are 0 answers