I have a Keras neural network with 26 features and 100 targets I want to explain with the SHAP python library. In order to plot the force plot, for instance, I do:
shap.force_plot(exp.expected_value[i], shap_values[j][k], x_val.columns)
Where:
exp.expected_values
is a list of size 100 with the base values for each of my targets (this is at least what I understand). The indexi
refers to the i-th target, I assume.shap_values
refers to the Shapley values of all the features for each of the targets in each validation case. Therefore,j
runs from 0 to 99 (i.e. the size of my targets) andk
runs from 0 to the total number of validation cases.
What I find confusing is that i
and j
can actually be different and I do get a plot that looks OK. However, shouldn't they always be the same index? Shouldn't the i-th baseline target always be compared to the shap values of the i-th target?
Am I understanding the indices wrong?
i
andj
should be the same, because you're plotting howith
target is affected by features, from base to predicted:where:
The reason behind is
exp.expected_value
will be of shapenum_targets
and they will be base values for shap values to be added to, and shap values should be of shape[num_classes, num_samples, num_features]
, if converted to numpy array.So, e.g., to get shap values for kth datapoint in raw space, one would do:
and for models using softmax to get to probability space one would do:
Note, this is assuming shap_values are of numpy array type.
Please ask if something is not clear.