I tried to reimplement a visual comparison of CE and MSE losses from a paper by Glorot and Bengio, Understanding the difficulty of training deep feedforward neural networks. My objective would be to obtain the same figure as in Sect. 4.1, Fig. 5.
See here: https://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf
The network description is a bit ambiguous and I came to this PyTorch network
model = nn.Sequential(
nn.Linear(1, 1, bias=False),
nn.Tanh(),
nn.Linear(1, 1, bias=False),
nn.Tanh(),
nn.Sigmoid()
)
Then I set the parameters with this function:
def set_weights(model, w1, w2):
(w1, w2) = torch.FloatTensor([w1, w2])
model[0].weight = nn.Parameter(torch.Tensor([[w1]]))
model[2].weight = nn.Parameter(torch.Tensor([[w2]]))
return model
The description of the computation of the losses in the paper is even more ambiguous. I tried many possible loss computations between the input and output and the one that comes closest to their figure (Fig. 5) is this one:
def losses(model, trials=10000):
"""
The random input is the truth
"""
p = torch.unsqueeze(torch.rand(trials), dim=1) # random input signal prob. relative to class 1
# m = model(torch.unsqueeze(torch.rand(trials), dim=1))
m = model(p)
p_0 = (p < 0.5).float() # class 0
p_1 = (p >= 0.5).float() # class 1
p = p_0 * (1 - p) + p_1 * p # prob. relative to class 0 and 1
m = p_0 * (1 - m) + p_1 * m
ce = -torch.mean(p * torch.log(m))
mse = torch.mean((p - m)**2)
return ce, mse
I varied the weights with
ce_l = []
mse_l = []
x_l = torch.linspace(-4, 4, 100)
y_l = torch.linspace(-4, 4, 100)
for w1, w2 in [(w1, w2) for w1 in x_l for w2 in y_l]:
model = set_weights(model, w1, w2)
ce, mse = losses(model)
ce_l += [ce]
mse_l += [mse]
and I plotted the losses on a 3-D figure. But my result is still not as sharp as the original figure. See here: Losses
Has somebody an idea?
Added the code to reproduce my figure: https://github.com/pnugues/ce_mse/blob/main/ce_mse.ipynb
as well as a notebook that uses PyTorch's loss functions BCELoss
and MSELoss
as is: https://github.com/pnugues/ce_mse/blob/main/ce_mse_pytorch_losses.ipynb
I found the answer to my question with the help of Xavier Glorot. The model is simpler than the one I used and has no hyperbolic tangents:
model = nn.Sequential(
nn.Linear(1, 1, bias=False),
nn.Linear(1, 1, bias=False),
nn.Sigmoid()
)
Using it I could replicate the figure. See the notebook here: https://github.com/pnugues/ce_mse/blob/main/ce_mse_pytorch_losses_xglorot.ipynb