Problem replicating a comparison between CE and MSE

97 views Asked by At

I tried to reimplement a visual comparison of CE and MSE losses from a paper by Glorot and Bengio, Understanding the difficulty of training deep feedforward neural networks. My objective would be to obtain the same figure as in Sect. 4.1, Fig. 5.

See here: https://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf

The network description is a bit ambiguous and I came to this PyTorch network

model = nn.Sequential(
    nn.Linear(1, 1, bias=False),
    nn.Tanh(),
    nn.Linear(1, 1, bias=False),
    nn.Tanh(),
    nn.Sigmoid()
)

Then I set the parameters with this function:

def set_weights(model, w1, w2):
    (w1, w2) = torch.FloatTensor([w1, w2])
    model[0].weight = nn.Parameter(torch.Tensor([[w1]]))
    model[2].weight = nn.Parameter(torch.Tensor([[w2]]))
    return model

The description of the computation of the losses in the paper is even more ambiguous. I tried many possible loss computations between the input and output and the one that comes closest to their figure (Fig. 5) is this one:

def losses(model, trials=10000):
    """
    The random input is the truth
    """
    p = torch.unsqueeze(torch.rand(trials), dim=1)  # random input signal prob. relative to class 1
    # m = model(torch.unsqueeze(torch.rand(trials), dim=1))
    m = model(p)
    p_0 = (p < 0.5).float()  # class 0
    p_1 = (p >= 0.5).float()  # class 1
    p = p_0 * (1 - p) + p_1 * p  # prob. relative to class 0 and 1
    m = p_0 * (1 - m) + p_1 * m
    ce = -torch.mean(p * torch.log(m))
    mse = torch.mean((p - m)**2)
    return ce, mse

I varied the weights with

ce_l = []
mse_l = []
x_l = torch.linspace(-4, 4, 100)
y_l = torch.linspace(-4, 4, 100)
for w1, w2 in [(w1, w2) for w1 in x_l for w2 in y_l]:
    model = set_weights(model, w1, w2)
    ce, mse = losses(model)
    ce_l += [ce]
    mse_l += [mse]

and I plotted the losses on a 3-D figure. But my result is still not as sharp as the original figure. See here: Losses

Has somebody an idea?

Added the code to reproduce my figure: https://github.com/pnugues/ce_mse/blob/main/ce_mse.ipynb as well as a notebook that uses PyTorch's loss functions BCELoss and MSELoss as is: https://github.com/pnugues/ce_mse/blob/main/ce_mse_pytorch_losses.ipynb

I found the answer to my question with the help of Xavier Glorot. The model is simpler than the one I used and has no hyperbolic tangents:

model = nn.Sequential(
    nn.Linear(1, 1, bias=False),
    nn.Linear(1, 1, bias=False),
    nn.Sigmoid()
)

Using it I could replicate the figure. See the notebook here: https://github.com/pnugues/ce_mse/blob/main/ce_mse_pytorch_losses_xglorot.ipynb

0

There are 0 answers