Problem replicating a comparison between CE and MSE

88 views Asked by Pierre Nugues At 22 November 2023 at 20:12

I tried to reimplement a visual comparison of CE and MSE losses from a paper by Glorot and Bengio, Understanding the difficulty of training deep feedforward neural networks. My objective would be to obtain the same figure as in Sect. 4.1, Fig. 5.

See here: https://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf

The network description is a bit ambiguous and I came to this PyTorch network

model = nn.Sequential(
    nn.Linear(1, 1, bias=False),
    nn.Tanh(),
    nn.Linear(1, 1, bias=False),
    nn.Tanh(),
    nn.Sigmoid()
)

Then I set the parameters with this function:

def set_weights(model, w1, w2):
    (w1, w2) = torch.FloatTensor([w1, w2])
    model[0].weight = nn.Parameter(torch.Tensor([[w1]]))
    model[2].weight = nn.Parameter(torch.Tensor([[w2]]))
    return model

The description of the computation of the losses in the paper is even more ambiguous. I tried many possible loss computations between the input and output and the one that comes closest to their figure (Fig. 5) is this one:

def losses(model, trials=10000):
    """
    The random input is the truth
    """
    p = torch.unsqueeze(torch.rand(trials), dim=1)  # random input signal prob. relative to class 1
    # m = model(torch.unsqueeze(torch.rand(trials), dim=1))
    m = model(p)
    p_0 = (p < 0.5).float()  # class 0
    p_1 = (p >= 0.5).float()  # class 1
    p = p_0 * (1 - p) + p_1 * p  # prob. relative to class 0 and 1
    m = p_0 * (1 - m) + p_1 * m
    ce = -torch.mean(p * torch.log(m))
    mse = torch.mean((p - m)**2)
    return ce, mse

I varied the weights with

ce_l = []
mse_l = []
x_l = torch.linspace(-4, 4, 100)
y_l = torch.linspace(-4, 4, 100)
for w1, w2 in [(w1, w2) for w1 in x_l for w2 in y_l]:
    model = set_weights(model, w1, w2)
    ce, mse = losses(model)
    ce_l += [ce]
    mse_l += [mse]

and I plotted the losses on a 3-D figure. But my result is still not as sharp as the original figure. See here: Losses

Has somebody an idea?

Added the code to reproduce my figure: https://github.com/pnugues/ce_mse/blob/main/ce_mse.ipynb as well as a notebook that uses PyTorch's loss functions BCELoss and MSELoss as is: https://github.com/pnugues/ce_mse/blob/main/ce_mse_pytorch_losses.ipynb

I found the answer to my question with the help of Xavier Glorot. The model is simpler than the one I used and has no hyperbolic tangents:

model = nn.Sequential(
    nn.Linear(1, 1, bias=False),
    nn.Linear(1, 1, bias=False),
    nn.Sigmoid()
)

Using it I could replicate the figure. See the notebook here: https://github.com/pnugues/ce_mse/blob/main/ce_mse_pytorch_losses_xglorot.ipynb

Original Q&A

TechQA.

Problem replicating a comparison between CE and MSE

There are 0 answers

Related Questions in NEURAL-NETWORK

Related Questions in CROSS-ENTROPY

Related Questions in MSE

Popular Questions

Popular Tags

Trending Questions