If I have two different neural networks (parametrized by model1 and model2) and corresponding two optimizers, would the below operation using model2.parameters without detach() lead to change in its gradients? My requirement is that I want to just compute the mean squared loss between the two model parameters but update the optimizer corresponding to model1, leaving model2 as is.
opt1 = torch.optim.SGD(self.model1.parameters(), lr=1e-3)
opt2 = torch.optim.SGD(self.model2.parameters(), lr=1e-3)
loss = (self.lamb / 2.) * ((torch.nn.utils.parameters_to_vector(self.model1.parameters()) - torch.nn.utils.parameters_to_vector(self.model2.parameters()))**2).sum()
loss.backward()
opt1.step()
How can I decide in general whether to use detach for any operation or not?