I'm currently trying to build an algorithm to maximize terminal wealth of a portfolio. I am using the REINFORCE with baseline algorithm present in Sutton and Barto (2018). I have one neural network for the policy, which has current wealth and time left on investment horizon as inputs, and outputs two values: a mean and standard deviation of a normal distribution. The discounted dollar amount invested in the risky asset is then sampled from this distribution. I have another network for the value function (same inputs but outputs state value). I have solved the problem analytically and my value network converges to the optimal solution well. My policy network does not which leads me to believe that I could improve the architecture of the network to 'help' it find the optimal solution. I am reasonably new to pytorch and neural networks and so would appreciate ideas as to how i could do this. My policy network is below, it has two hidden layers with 32 nodes each. I have also played around with the learning rates and it does not seem to help too much. Thanks!
class PolicyNetwork(nn.Module):
''' Neural Network for the policy, which is taken to be normally distributed hence
this network returns a mean and variance '''
def __init__(self, lr, input_dims, fc1_dims, fc2_dims, n_returns):
super(PolicyNetwork, self).__init__()
self.input_dims = input_dims
self.fc1_dims = fc1_dims
self.fc2_dims = fc2_dims
self.n_returns = n_returns
self.lr = lr
self.fc1 = nn.Linear(*self.input_dims, self.fc1_dims) # inputs should be wealth and time to maturity
self.fc2 = nn.Linear(self.fc1_dims,self.fc2_dims)
self.fc3 = nn.Linear(self.fc2_dims,n_returns) # returns mean and sd of normal dist
self.optimizer = optim.Adam(self.parameters(), lr = lr)
def forward(self, observation):
state = torch.Tensor(observation).float().unsqueeze(0)
x = F.relu(self.fc1(state))
x = F.relu(self.fc2(x))
x = self.fc3(x)
first_slice = x[:,0]
second_slice = x[:,1]
tuple_of_activated_parts = (
first_slice, # let mean be negative
#F.relu(first_slice), # make sure mean is positive
#torch.sigmoid(second_slice) # make sure sd is positive
F.softplus(second_slice) # make sd positive but dont trap below 1
)
out = torch.cat(tuple_of_activated_parts, dim=-1)
return out