I would like to fit a pytorch feed forward network on a crafted dataset with dependency between labels y and two features from the dataset.

Dataset is generated using np.random.random_sample for a distribution between 0 and 1 and label is computed using the two functions below:

  • sum_bin_label
  • sum_mod_label

The first function I can see that both training and validation loss of the neural network is decreasing and eventually it is able to approximate the function with close to 100%, what is expected, but for the second function that is using sum and modulo(num_classes) it is unable to make any progress. I have tried multiple learning rates and network architectures but did not manage to fit it.

I am interested to see how that function can be fitted.

Bellow is a simple example that can be pasted directly to a jupyter notebook or any kind of python repl for that matter.

Thanks in advance!

Imports

import torch
import numpy as np
from sklearn.model_selection import train_test_split
import torch.utils.data as utils
DATASHAPE = (2000, 2)
NUM_CLASSES = 3

Functions and classes used

def sum_mod_label(x):
    return np.array([x for x in map(
        lambda x: x % NUM_CLASSES, map(int, (x[:, 0] + x[:, 1]) * 100))])

def sum_bin_label(x):
    def binit(x):
        if x < 0.807:
            return 0
        if x < 1.169:
            return 1
        return 2

    return np.array(
        [x for x in map(lambda x: binit(x), x[:, 0] + x[:, 1])])

class RandomModuloDataset(utils.Dataset):
    def __init__(self, shape, label_fn):
        self.data = np.random.random_sample(shape)
        self.label = label_fn(self.data)

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx, :], self.label[idx]

class FeedForward(torch.nn.Module):
    def __init__(self, input_size, num_classes):
        super().__init__()
        self.input_size = input_size
        self.num_classes = num_classes

        self.relu = torch.nn.ReLU()
        self.softmax = torch.nn.Softmax(dim=-1)

        self.fc1 = torch.nn.Linear(
            self.input_size, self.input_size)

        self.fc2 = torch.nn.Linear(
            self.input_size, self.num_classes)

    def forward(self, x, **kwargs):
        output = self.fc2(self.relu(self.fc1(x.float())))
        return self.softmax(output)

def fitit(trainloader, epochs=10):
    neurons = DATASHAPE[1]

    net = FeedForward(neurons, NUM_CLASSES)
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

    for epoch in range(epochs):
        for i, data in enumerate(trainloader, 0):
            inputs, labels = data

            optimizer.zero_grad()

            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

        print('[%d] loss: %.3f' %
              (epoch + 1, loss.item()))

Iteration with first function (eventually converges)

sum_bin_tloader = utils.DataLoader(
    RandomModuloDataset(DATASHAPE, sum_bin_label))
fitit(sum_bin_tloader, epochs=50)

[1] loss: 1.111
[2] loss: 1.133
[3] loss: 1.212
[4] loss: 1.264
[5] loss: 1.261
[6] loss: 1.199
[7] loss: 1.094
[8] loss: 1.011
[9] loss: 0.958
[10] loss: 0.922
[11] loss: 0.896
[12] loss: 0.876
[13] loss: 0.858
[14] loss: 0.844
[15] loss: 0.831
[16] loss: 0.820
[17] loss: 0.811
[18] loss: 0.803
[19] loss: 0.795
[20] loss: 0.788
[21] loss: 0.782
[22] loss: 0.776
[23] loss: 0.771
[24] loss: 0.766
[25] loss: 0.761
[26] loss: 0.757
[27] loss: 0.753
[28] loss: 0.749
[29] loss: 0.745
[30] loss: 0.741
[31] loss: 0.738
[32] loss: 0.734
[33] loss: 0.731
[34] loss: 0.728
[35] loss: 0.725
[36] loss: 0.722
[37] loss: 0.719
[38] loss: 0.717
[39] loss: 0.714
[40] loss: 0.712
[41] loss: 0.709
[42] loss: 0.707
[43] loss: 0.705
[44] loss: 0.703
[45] loss: 0.701
[46] loss: 0.699
[47] loss: 0.697
[48] loss: 0.695
[49] loss: 0.693
[50] loss: 0.691

Iteration with second function (does not converge)

sum_mod_tloader = utils.DataLoader(
    RandomModuloDataset(DATASHAPE, sum_mod_label))
fitit(sum_mod_tloader, epochs=50)

[1] loss: 1.059
[2] loss: 1.065
[3] loss: 1.079
[4] loss: 1.087
[5] loss: 1.091
[6] loss: 1.092
[7] loss: 1.092
[8] loss: 1.092
[9] loss: 1.092
[10] loss: 1.091
[11] loss: 1.091
[12] loss: 1.091
[13] loss: 1.091
[14] loss: 1.091
[15] loss: 1.090
[16] loss: 1.090
[17] loss: 1.090
[18] loss: 1.090
[19] loss: 1.090
[20] loss: 1.090
[21] loss: 1.090
[22] loss: 1.089
[23] loss: 1.089
[24] loss: 1.089
[25] loss: 1.089
[26] loss: 1.089
[27] loss: 1.089
[28] loss: 1.089
[29] loss: 1.089
[30] loss: 1.089
[31] loss: 1.089
[32] loss: 1.089
[33] loss: 1.089
[34] loss: 1.089
[35] loss: 1.089
[36] loss: 1.089
[37] loss: 1.089
[38] loss: 1.089
[39] loss: 1.089
[40] loss: 1.089
[41] loss: 1.089
[42] loss: 1.089
[43] loss: 1.089
[44] loss: 1.089
[45] loss: 1.089
[46] loss: 1.089
[47] loss: 1.089
[48] loss: 1.089
[49] loss: 1.089
[50] loss: 1.089

I expect to be able to fit both functions, since NN should be able to find any function y=f(x) describing the dependend variable, but the training is not progressing for sum_mod_label.

Using catboost I was able to get reasonable accuracy (~75% on the sum_mod_label)

1 Answers

0
tsavchyn On

Just try to plot your data and you will see that functions produce datasets of different complexities. In the second case classes are almost inseparable, so you need to increase your model complexity.

How to increase your model complexity:

  • More layers
  • More hidden units
  • Tweak batch size
  • Tweak lr, try to use lr-scheduler
  • Try another optimizer, for example Adam
  • To avoid overfitting in very deep networks add dropout layers
  • Take a look at very promising Self Normalizing Neural Networks

Code:

import numpy as np
import matplotlib.pyplot as p

DATASHAPE = (2000, 2)
NUM_CLASSES = 3


def sum_mod_label(x):
    return np.array([x for x in map(lambda x: x % NUM_CLASSES, map(int, (x[:, 0] + x[:, 1]) * 100))])


def sum_bin_label(x):
    def binit(x):
        if x < 0.807:
            return 0
        if x < 1.169:
            return 1
        return 2

    return np.array([x for x in map(lambda x: binit(x), x[:, 0] + x[:, 1])])

data = np.random.random_sample(DATASHAPE)
bin_label = sum_bin_label(data)
mod_label = sum_mod_label(data)


def plot_data(data, label, title):
    plt.figure(figsize=(9, 9))
    plt.title(title)
    plt.scatter(data[..., 0], data[..., 1], c=label)
    plt.show()

plot_data(data, bin_label, 'sum_bin_label')
plot_data(data, mod_label, 'sum_mod_label')

Output:

plot1 plot2