How to handle softmax derivatives matrix size when performing backpropagation with neural network?

88 views Asked by At

So I have an input layer, a hidden layer and an output layer. The forward makes sense to me and I've got the basic backpropagation steps down and after using a tutorial online. I'm using a softmax function on the output layer and I have realised that the tutorial hasn't implemented the derivative of the softmax function in the backpropagation steps. It just completely misses that part out.

Here is the code I'm using for the forward and backward prop:

FORWARD:

z1 = np.dot(X_train,w1) + b1
a1 = ReLU(z1)

z2 = np.dot(a1,w2) + b2
a2 = ReLU(z2)

z3 = np.dot(a2,w3) + b3
a3 = softmax(z3)

error = a3-y_train

BACKWARD:

dw3 = np.dot(dcost.T,a2).T
dw2 = np.dot( (np.dot(dcost,w3.T)*dReLU(z2)).T,a1).T
dw1 = np.dot((np.dot(np.dot(error,w3.T)*dReLU(z2),w2.T)*dReLU(z1)).T,X_train).T

db3 = np.sum(dcost,axis=0)
db2 = np.sum(np.dot(dcost,w3.T)*dReLU(z2),axis=0)
db1 = np.sum((np.dot((np.dot(dcost,w3.T)*dReLU(z2)),w2.T)*dReLU(z1)),axis=0)

w3 = w3 - lr*dw3
w2 = w2 - lr*dw2
w1 = w1 - lr*dw1

b3 = b3 - lr*db3
b2 = b2 - lr*db2
b1 = b1 - lr*db1

And my softmax/ softmax derivative definitions:

def softmax(z):
    z = z - np.max(z, axis = 1).reshape(z.shape[0],1)
    return np.exp(z) / np.sum(np.exp(z), axis = 1).reshape(z.shape[0],1)

def dsoftmax(softmax):
    s = softmax.reshape(-1,1)
    return np.diagflat(s) - np.dot(s, s.T)

As an example, in the backpropagation for dw1 I have tried inserting the dsoftmax into the location where it should be according to my maths. See bold.

dw1 = np.dot((np.dot(np.dot(error*dsoftmax(z3),w3.T)*dReLU(z2),w2.T)*dReLU(z1)).T,X_train).T

But the problem I have is that z3 is size (32,10) and the softmax derivative of z3 is size (320,320), obviously doesn't work when I try to multiply them together.

I can't find any help online and I'm quite new to this so I have no idea what to do, I must be missing something major.

To reproduce the matrix sizes you could use:

z3 = np.random.rand(32,10)
dsoftmax(z3)

Thank you

0

There are 0 answers