How do I properly use shap decision plots and force plots with multiple regression targets?

1.3k views Asked by At

I have a Keras neural network with 26 features and 100 targets I want to explain with the SHAP python library. In order to plot the force plot, for instance, I do:

shap.force_plot(exp.expected_value[i], shap_values[j][k], x_val.columns)

Where:

  • exp.expected_values is a list of size 100 with the base values for each of my targets (this is at least what I understand). The index i refers to the i-th target, I assume.
  • shap_values refers to the Shapley values of all the features for each of the targets in each validation case. Therefore, j runs from 0 to 99 (i.e. the size of my targets) and k runs from 0 to the total number of validation cases.

What I find confusing is that i and j can actually be different and I do get a plot that looks OK. However, shouldn't they always be the same index? Shouldn't the i-th baseline target always be compared to the shap values of the i-th target? Am I understanding the indices wrong?

1

There are 1 answers

1
Sergey Bushmanov On

i and j should be the same, because you're plotting how ith target is affected by features, from base to predicted:

shap.force_plot(exp.expected_value[i], shap_values[i][k], x_val.columns)

where:

  • i stands for ith target class
  • k stands for kth sample to be explained.

The reason behind is exp.expected_value will be of shape num_targets and they will be base values for shap values to be added to, and shap values should be of shape [num_classes, num_samples, num_features], if converted to numpy array.

So, e.g., to get shap values for kth datapoint in raw space, one would do:

shap_values[:,k,:].sum(1) + base_values

and for models using softmax to get to probability space one would do:

softmax(shap_values[:,k,:].sum(1) + base_values)

Note, this is assuming shap_values are of numpy array type.

Please ask if something is not clear.