When trying to get cross-entropy with sigmoid activation function, there is a difference between
loss1 = -tf.reduce_sum(p*tf.log(q), 1)
loss2 = tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(labels=p, logits=logit_q),1)
But they are the same when with softmax activation function.
Following is the sample code:
import tensorflow as tf
sess2 = tf.InteractiveSession()
p = tf.placeholder(tf.float32, shape=[None, 5])
logit_q = tf.placeholder(tf.float32, shape=[None, 5])
q = tf.nn.sigmoid(logit_q)
sess.run(tf.global_variables_initializer())
feed_dict = {p: [[0, 0, 0, 1, 0], [1,0,0,0,0]], logit_q: [[0.2, 0.2, 0.2, 0.2, 0.2], [0.3, 0.3, 0.2, 0.1, 0.1]]}
loss1 = -tf.reduce_sum(p*tf.log(q),1).eval(feed_dict)
loss2 = tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(labels=p, logits=logit_q),1).eval(feed_dict)
print(p.eval(feed_dict), "\n", q.eval(feed_dict))
print("\n",loss1, "\n", loss2)
you can understand differences between softmax and sigmoid cross entropy in following way:
so anyway the cross entropy is:
for softmax cross entropy it looks exactly as above formula,
but for sigmoid, it looks a little different for it has multi binary probability distribution for each binary probability distribution, it is
p and (1-p) you can treat as two class probability within each binary probability distribution