I'm using cross-entropy loss for a multi-class classification task with vector inputs for both true class distribution y_i and softmax predictions p_i.
I'm a bit confused about the notation when dealing with vector inputs for both the true class distributionand the softmax predictions. Is the notation
$\ell_{CE}(y_i, p_i) = -\sum_{i=1}^k y_i \log p_i$
correct for handling these vectors, or should additional subscripts $j$ as well like this?
$\ell_{CE}(y_i, p_i) = -\sum_{j=1}^{k} y_{ij} \log p_{ij}$
Thanks!

