Correct approach to improve/retrain an offiline model

75 views Asked by Felipe Leite Antunes At 04 December 2020 at 14:40

I have a recommendation system that was trained using Behavior Cloning (BC) with offline data generated using a supervised learning model converted to batch format using the approach described here. Currently, the model is exploring using an e-greedy strategy. I want to migrate from BC to MARWIL changing the beta.

There is a couple of ways to do that:

Convert the data employed to train the BC algorithm plus the agent’s new data and retrain from scratch using MARWIL.
Convert the new data generated by the agent and put it together with the previous converted data employed to train the BC algorithm, using the input parameter, doing something similar to what is described here, and retrain from scratch using MARWIL .
Convert the new data generated by the agent and put it together with the previous converted data employed to train the BC algorithm, using the input parameter, doing something similar to what is described here, and retrain using the restored BC agent using MARWIL . Questions:

Following option 1.:

Given that the new data slice would be very small compared with the previous one, would the model learn something new? When we stop using original data?

Following option 2.:

Following option 3.:

Given that the new data slice would be very small compared with the previous one, would the model learn something new? When we stop using original data? This approach works for trajectories associated with new episodes ids, but it will extend the trajectories of episodes already present in the original batch? The retrain would update the networks’ weights using the new data points, but to do that how many iterations should we use? How to prevent catastrophic forgetting?

Original Q&A

TechQA.

Correct approach to improve/retrain an offiline model

There are 0 answers

Related Questions in OFFLINE

Related Questions in REINFORCEMENT-LEARNING

Related Questions in Q-LEARNING

Related Questions in RAY

Related Questions in RLLIB

Popular Questions

Popular Tags

Trending Questions