I have a Keras neural network with 26 features and 100 targets I want to explain with the SHAP python library. In order to plot the force plot, for instance, I do:
shap.force_plot(exp.expected_value[i], shap_values[j][k], x_val.columns)
Where:
exp.expected_valuesis a list of size 100 with the base values for each of my targets (this is at least what I understand). The indexirefers to the i-th target, I assume.shap_valuesrefers to the Shapley values of all the features for each of the targets in each validation case. Therefore,jruns from 0 to 99 (i.e. the size of my targets) andkruns from 0 to the total number of validation cases.
What I find confusing is that i and j can actually be different and I do get a plot that looks OK. However, shouldn't they always be the same index? Shouldn't the i-th baseline target always be compared to the shap values of the i-th target?
Am I understanding the indices wrong?
iandjshould be the same, because you're plotting howithtarget is affected by features, from base to predicted:where:
The reason behind is
exp.expected_valuewill be of shapenum_targetsand they will be base values for shap values to be added to, and shap values should be of shape[num_classes, num_samples, num_features], if converted to numpy array.So, e.g., to get shap values for kth datapoint in raw space, one would do:
and for models using softmax to get to probability space one would do:
Note, this is assuming shap_values are of numpy array type.
Please ask if something is not clear.