Extracting the gradient during PyTorch fit functions

42 views Asked by At

I am trying to train a neural network until the L2-norm of its gradient is within 10e-3 of 0; therefore, my code includes defining the parameters and gradients that are computed during the fit process. I keep hitting snags that make me think I am not getting at the parameters or gradient correctly.

Here is my code:

def get_theta(self):
    theta = self.parameters().detach().cpu
    return theta

def J_loss(self, xb, yb):
    #forward returns x so here it will return x on GPU
    #return cross_entropy result of xb and yb on GPU
    return F.cross_entropy(self.forward(xb.to(device)), yb.to(device))

def fit(self, loader, epochs = 1999):
    norm2Gradient = 1
    while norm2Gradient >10e-3 and epochs <2000:
        #grad = []
        for _, batch in enumerate(loader):
            x, y = batch['x'], batch['y']
            #computes f.cross_entropy loss of (xb,yb) on GPU 
            loss = self.J_loss(x,y) 
            #print("loss:", loss)
            #computes new gradients
            grad = loss.backward()
            #print("grad:",grad)
            print("grad?",grad)
            #takes one step along new gradients to decrease the loss; updates parameters 
            self.optimizer.step()  
            #captures new parameters
            theta = self.parameters()
            print("theta:",theta)
            #collects gradient along new parameters
            for param in theta:
                grad.append(param.grad)
            #computes gradient norm
            norm2Gradient = torch.linalg.norm(grad)
            sumNorm2Gradient += norm2Gradient.detach().cpu
            #clears out old gradients  
            self.optimizer.zero_grad()
    return sumNorm2Gradient

The current error message, "AttributeError: 'NoneType' object has no attribute 'append'" occurs at the line:

grad.append(param.grad)

Additionally, the print out of the variable "grad" is "None". I have combed through documentation trying to figure out what each line is doing in the code and how to extract the gradient and parameters. How do I correctly get at the gradient?

2

There are 2 answers

4
Ivan On

You defined grad the following way in your code:

        grad = loss.backward()

You are getting this error because torch.Tensor.backward returns precisely None.

5
Karl On

The issue is here

for _, batch in enumerate(loader):
    ...
    grad = loss.backward()
    ...
    for param in theta:
        grad.append(param.grad)
    ...

backward does not return a value. When you run grad = loss.backward(), you are assigning grad = None. Later, you attempt to append values to None via grad.append(param.grad), hence the error.