How to write a custom policy in tf_agents

218 views Asked by tjt At 02 May 2022 at 17:32

I wanted to use the contextual bandit agents (LinearThompson Sampling agent) in the tf_Agents.

I am using a custom environment and my rewards are delayed by 3 days. Hence for training, the observations are generated from the saved historical tables (predictions generated 3 days ago) and their corresponding rewards (Also in the table).

Given this, only during training, how do I make the policy to output an action, for a given observation, from the historical tables? And during evaluation I want the policy to behave the usual way, generating the actions using the policy it learned from.

Looks like I need to write a custom policy, that behaves in a way during training and behaves it's usual self (linearthompsonsampling.policy) during evaluation. Unfortunately I couldn't find any examples or documentation for this usecase. Can someone please explain how to code this - an example would be very useful

Original Q&A

TechQA.

How to write a custom policy in tf_agents

There are 0 answers

Related Questions in TENSORFLOW

Related Questions in REINFORCEMENT-LEARNING

Related Questions in TF-AGENT

Popular Questions

Popular Tags

Trending Questions