I'm trying to implement the derivative matrix of softmax function (Jacobian matrix of Softmax).
I know mathematically the derivative of Softmax(Xi) with respect to Xj is:
where the red delta is a Kronecker delta.
So far what I have implemented is:
def softmax_grad(s):
# input s is softmax value of the original input x. Its shape is (1,n)
# e.i. s = np.array([0.3,0.7]), x = np.array([0,1])
# make the matrix whose size is n^2.
jacobian_m = np.diag(s)
for i in range(len(jacobian_m)):
for j in range(len(jacobian_m)):
if i == j:
jacobian_m[i][j] = s[i] * (1-s[i])
else:
jacobian_m[i][j] = -s[i]*s[j]
return jacobian_m
When I test:
In [95]: x
Out[95]: array([1, 2])
In [96]: softmax(x)
Out[96]: array([ 0.26894142, 0.73105858])
In [97]: softmax_grad(softmax(x))
Out[97]:
array([[ 0.19661193, -0.19661193],
[-0.19661193, 0.19661193]])
How do you guys implement Jacobian? I'd like to know if there is a better way to do this. Any reference to website/tutorial would be appreciated as well.
You can vectorize
softmax_grad
like the following;Details:
sigma(j) * delta(ij)
is a diagonal matrix withsigma(j)
as diagonal elements which you can create withnp.diagflat(s)
;sigma(j) * sigma(i)
is a matrix multiplication (or outer product) of the softmax which can be calculated usingnp.dot
: